Exploring
University Mathematics 1
N.J.Hardiman
Bedford College, London
The Commonwealth and Inter-
national Library of Science
Technology Engineering and
Liberal Studies
m
x
■o
3
c
3
<"
—I
3"
3
0)
o'
(J)
510, I
THE COMMONWEALTH AND I NTER NATIONAL LI BR ARY
Joint Chairmen of the Honorary Editorial Advisory Board
SIR ROBERT ROBINSON, O.M.. F.R.S.. LONDON
DEAN ATHELSTAN S PI LH AU S, M I N N E S OTA
Publisher: ROBERT MAXWELL, M.C., M.P.
MATHEMATICS DIVISION
General Editors: W. J. LANGFORD. E. A. MAXWELL
EXPLORING UNIVERSITY
MATHEMATICS 1
EXPLORING UNIVERSITY
MATHEMATICS 1
LECTURES GIVEN AT BEDFORD COLLEGE, LONDON
by
P. CHADWICK J. R. ELLIS
J. H. E. COHN G. T. KNEEBONE
H. G. EGGLESTON M. LEVISON
S. J. TAYLOR
Edited by
N. J. HARDIMAN
PERGAMON PRESS
OXFORD • LONDON • EDINBURGH • NEW YORK
TORONTO • SYDNEY • PARIS • BRAUNSCHWEIG
E»fe7 \^o5g-
5/o
Pergamon Press Ltd., Headington Hill Hall, Oxford
4 & 5 Fitzroy Square, London W.l
Pergamon Press (Scotland) Ltd., 2 & 3 Teviot Place, Edinburgh 1
Pergamon Press Inc., 44-01 21st Street, Long Island City,
New York 11101
Pergamon of Canada Ltd., 6 Adelaide Street East, Toronto, Ontario
Pergamon Press (Aust.) Pty. Ltd., 20-22 Margaret Street, Sydney,
New South Wales
Pergamon Press S.A.R.L., 24 rue des Ecoles, Paris 5 e
Vieweg & Sohn GmbH, Burgplatz 1, Braunschweig
Copyright © 1967 Pergamon Press Ltd.
tion 1967
1 3 NOV 198fe
This book is sold subject to the condition
that it shall not, by way of trade, be lent,
resold, hired out, or otherwise disposed
of without the publisher's consent,
in any form of binding or cover
other than that in which
it is published.
(2852/67) •
Contents
Foreword
Editorial
1 SETS AND FUNCTIONS 1
G. T. Kneebone, M. Sc., Ph. D.
Reader in Foundations of Mathematics
at Bedford College, London.
2 SPECIAL RELATIVITY AND SOME
APPLICATIONS 9
J. R. Ellis, Ph.D.
Assistant Lecturer in Applied Mathematics
at Bedford College. London.
5 SOME PROPERTIES OF INTEGERS AND PRIMES 31
S.J.Taylor, Ph.D.
Professor of Pure Mathematics
at Westfield College, London.
4 WAVES 44
P. Chadwick. Ph. D.
Professor of Mathematics in the University of
East Anglia at Norwich. (Formerly Senior Lecturer
in Applied Mathematics at Sheffield University.)
5 SQUARE FIBONACCI NUMBERS 69
J.H.E.Cohn, M.A., D.Phil.
Lecturer in Mathematics
at Bedford College, London.
VI
CONTENTS
6 DIGITAL COMPUTERS AND THEIR
APPLICATIONS
M. Levison, Ph.D.
Lecturer and Acting Head of Department of
Computer Science, Birkbeck College, London.
7 THE ISOPERIMETRIC PROBLEM
H. G. Eggleston, M. A., Sc. D.
Professor of Pure Mathematics
at Bedford College, London.
83
95
Foreword
By Professor H. G. Eggleston
Head of the Mathematics Department, Bedford College,
London
Tun lectures in this book were given at the 1965 Bedford
College Easter Conference in Mathematics. These confer-
ences have been held annually since 1963 when we initiated
the programme with what we believe to have been the first
conference of its kind held by any university in Britain.
Our aims were (a) to increase the contact between schools and
universities, (b) to enable pupils from different schools to
meet and discuss topics of common interest and (c) to
stimulate interest by introducing students in their last or
penultimate years at school to branches of mathematics which
would be novel to them. The lectures are primarily designed
for students about to embark upon a degree course of which
mathematics is a major part. As well as students, teachers
of mathematics have also taken part in the conferences.
Although those attending are drawn from schools in all parts
of the country the number involved each year is, unfortu-
nately, very limited. This year the organizers of the confer-
ences felt that these lectures, given by professional mathe-
maticians on subjects of current mathematical interest and
yet assuming little mathematical background, would be of
interest to a wider public. It was therefore decided to
publish them in a book and so increase the "audience" many
times.
vn
Editorial
The seven lectures comprising the chapters of this book
formed the programme for the 1965 Easter Conference in
Mathematics at Bedford College, London. Those attending
the conference were mostly in their last year at school and
intending to read for a degree in mathematics at university
the following year, or teachers concerned with this level of
work in the schools. The scope of the lectures is fairly wide
and is divided between pure mathematics and applied
mathematics, with a natural bias towards the former at
this level. Each lecture is quite independent, so that getting
"lost" in one lecture does not mean that a subsequent lecture
is unintelligible. (This, of course, is less important in the
book, as the reader has time to take each lecture as
slowly as necessary for complete comprehension.) Wherever
possible, a list of suggestions for further reading is given
at the end of the chapter.
Four of the lectures were given by members of the Mathe-
matics Department of Bedford College, whilst for the other
three we were very pleased to welcome staff from other
colleges or universities who were interested in taking part
in the conference. Each lecturer chose a subject in which
In- is an expert, either as a teacher or as a research worker.
As far as possible in each conference, at least one lecture
is given on the lecturer's research work, although the topics
which may be presented at this level are rather limited.
This year Dr. Cohn talked about a paper which he had
published as recently as 1963. Although the problem on
Square Fibonacci Numbers is simple to state and can be
IX
EDITORIAL
solved by elementary methods, the solution has eluded able
mathematicians for many years and was only discovered by
the exercise of considerable ingenuity.
Professor Eggleston's lecture on the Isoperimetric Problem
was given to an invited audience of the more advanced pupils
(i.e. those who had already passed A-level), the school
teachers and university lecturers. The mathematical level
of this lecture is slightly higher than that of the other six
which comprised the main programme for the conference, but
taken at the reader's own pace it should be comprehensible
and of considerable interest to all readers.
I should like to thank all those who have taken part in the
writing and proofreading of the book, and the Pergamon
Press for the care which they have given to the production
of the book.
N. J. Hardiman
Bedford College, London
June, 1965
CHAPTER 1
Sets and Functions
G. T. Kneebone
As everyone knows who has studied the calculus, one of
the most important concepts in mathematics is that of
functional dependence. In this lecture I propose to glance
at the general notion of function in order to see how it has
been reinterpreted as a result of investigations made in recent
times into the foundations of mathematics.
Up to about 50 years ago it was taken generally for granted
that the business of the mathematician is simply to do
mathematics, that is to say either to use his expert knowledge
in applying mathematical techniques to problems of science
or engineering or else to extend mathematical knowledge
itself by developing entirely new theories or adding further
details to theories already in existence. From time to time
a philosopher or philosophically-minded mathematician might
crop up, who would ask questions concerning what mathe-
matics is about or what constitutes valid mathematical proof —
but such persons were thought of as being very much on the
fringe of the subject. But today all this is radically changed
and foundational studies are accepted, and even welcomed,
as belonging to the main body of mathematics. Indeed, some
of the most spectacular advances in modern mathematical
research are due as much to the efforts of mathematical
logicians as to those of mathematicians of the more traditional
kind.
2 EXPLORING UNIVERSITY MATHEMATICS 1
Such a state of affairs is not wholly new, for at certain
crucial moments in the history of mathematics accepted
mathematical ideas have turned out to be less transparent
than had hitherto been supposed. It will be sufficient to
recall two such episodes.
(1) In ancient Greece, whenever mathematicians dealt with
magnitudes they envisaged these not as abstract numbers but
rather as lengths or areas; and the earlier Greek geometers
thought it obvious that any line which can be obtained from
a given line by simple geometrical construction can be
measured exactly as a multiple of a sufficiently small sub-
multiple of that line. But eventually this belief was dispelled
by Pythagoras, who demonstrated that the diagonal of a
square is not commensurable with the side (i. e. that the
square root of 2 is an irrational number). This discovery
of Pythagoras revolutionized the whole conception of magni-
tude and led to the creation of the famous theory of
proportion contained in Book V of Euclid's Elements, and
also in a later age, much nearer our own day, to Dedekind's
theory of real numbers.
(2) The subject of geometry was conceived by Euclid and
his contemporaries as a study of the properties of space as
it really is — properties that are crudely exemplified in our
physical constructions and measurements — and geometry
was accordingly presented as a deductive theory based upon
a small number of principles that were accepted as indubi-
table. Such a view of the nature of geometrical knowledge
endured for more than 2000 years; but it had, nevertheless,
to be discarded when non-euclidean geometries were found
to be a theoretical possibility, and Euclid's set of axioms
became only one system among many possible ones. From
that time on, euclidean geometry has been interpreted as a
hypothetical theory instead of a factual theory of objective
reality — and the axiomatic method, as thus modified, today
dominates the whole of pure mathematics.
SETS AND FUNCTIONS 3
Discoveries of a foundational character can thus have
far-reaching repercussions within mathematics itself; and
we shall now see how this has happened with the concept
of function. To reach adequate understanding of a mathe-
matical concept or a mathematical theory, of any but the
most abstract kind, one must learn something of its history
and something of its logic. On the whole we would expect
the logic to count for more than the history — from a
strictly mathematical point of view of course, though not
necessarily from the rather different point of view of philo-
sophy. In the present instance I shall begin with history,
since the concept of function is not so much a single abstract
idea as an orientation of mind that has found different
expression in different ages.
For the Greeks, the idea of function did not yet exist. The
Greek theory of magnitudes (as presented by Euclid, for
example) is essentially a static theory, concerned with
discrete quantities and fixed relations between them, whereas
the concept of a function arose in the first instance out of
the consideration of quantities which are susceptible of
variation in accordance with some fixed law. The mathe-
matical concept in fact came into being in the seventeenth
century, in close association with a radically new frame of
thought in natural science. Whereas ancient and mediaeval
science had been qualitative the new science was quantitative,
and the aim of scientists was now to discover mathematical
laws which characterize the interdependence of the physical
quantities in terms of which the relevant state of a given
system can be described. In some simple cases the relation-
ship in question can be expressed by an algebraic equation
(as in the case of Boyle's law pv = k) but more usually a
differential equation is needed. In either case, however,
the physical law is identified with a functional relationship.
As long as we are thinking of dependencies between
physical quantities, the ideas of variable and function seem
very natural. The physical magnitude, e.g. the pressure of
4 EXPLORING UNIVERSITY MATHEMATICS 1
a gas, is something which can be directly measured whenever
required, and which then has a well-defined value. And one
quantity may be considered as a function of another when
the value of the first is uniquely determined by the value
of the second. But when we translate these considerations
into the language of pure mathematics we no longer have
any observable or measurable quantities to fall back upon,
and our "variables" are then mere phantoms. Indeed, one can
argue that the very notion of a variable quantity, conceived
abstractly, is a contradiction in terms. It was in fact argued
long ago by Zeno of Elea, in one of his famous paradoxes,
that a flying arrow can never be moving, since at any instant
it occupies precisely the space in which it is situated.
For a long time mathematicians managed to evade the
difficulty concerning the nature of abstract variables by
relying upon geometrical intuition; and since they were
already very much at home with geometrical relationships
they were able to develop a detailed theory of functions
by such means. Functions were thought to be represented
adequately by their graphs; and a function was said to
be continuous, for example, if its graph was free from
breaks. On this basis a simple proof could be given of the
fundamental theorem that if a continuous function / is
positive for x = a and negative for x = b then there is a
number c between a and b such that f (c) = 0. For we
cannot pass from above the ^r-axis to below the #-axis by a
continuous path without crossing the axis at least once.
Gradually, however, the more careful mathematicians came
to distrust this kind of semi-intuitive reasoning; and Cauchy
and Gauss, in particular, tried to reformulate analysis (i. e.
the theory of functions) strictly in terms of abstract numbers.
While almost all the familiar theorems continued to be
accepted as valid, this was only because ways were now
found of proving them in a rigorous manner.
Criticism of the intuitive approach to mathematics was
pressed even further by Dedekind and Peano, both of
SETS AND FUNCTIONS 5
whom were pioneers in raising the critical study of the
foundations of mathematics to the dignity of an independent
and essential discipline. One of Dedekind's decisive acts
was his uncompromising rejection of the intuitive notion of
continuity as an adequate basis for a theory of continuous
functions; and it is mainly due to this initiative that the sort
of plausibility argument that is exemplified by our earlier
"proof" that any continuous function which changes sign
must attain the value zero is no longer acceptable as a
mathematical demonstration.
It was not Dedekind, however, but Peano who eventually
dealt the death-blow to intuitive analysis by giving an
analytical definition of a space-filling curve. According to
the traditional view every function has a graph, and every
pair of equations x = f (t), y = g (t), in which / and g are
given functions and t is restricted to some given range,
defines an arc of a curve. Peano devised a pair of functions
f and g with the property that, for any xo such that < Xq < 1
and any y such that < y <, 1, there is a t such that
0<t ^l, x = f (to), and y = g (to). In other words he
defined parametrically a "curve" which passes through every
point of the square region bounded by the four straight lines
x = Q, x = I, y = 0, and y — 1. Since a curve in the intuitive
sense is a locus with length but no breadth, it cannot fill up
an area; and there is thus a fundamental incompatibility
between the intuitive notion of curve and the more formal
notion of the locus determined by a parametric representation.
The moral of this, as Peano saw it and as the entire
mathematical world came by degrees to see it also, is that if
mathematical theorems make statements about functions in
an abstract sense they cannot be proved by appeal to what
is "obviously" the case for curves in a totally different
intuitive sense.
Since 1890, the year of publication of the paper in which
Peano defined his anomalous curve, mathematicians have
systematically purged the theory of functions of every vestige
6
EXPLORING UNIVERSITY MATHEMATICS 1
of reliance on geometrical ideas; and the theory in the form
in which it is normally presented today is wholly abstract.
But since, even in the most formal mathematics, we cannot
produce something out of nothing, we must begin with
something that is admitted as logically prior to mathematics,
even though it can no longer be supplied by spatial intuition.
There is now fairly general agreement that what we must
presuppose consists of (1) the basic principles of logic, which
are implicit in the very conception of deductive reasoning,
and (2) certain very general notions such as that of set,
which are themselves closely allied to abstract logic. Using
only such materials, which lie at the furthest extreme of
generality and abstractness, we can in fact refashion not only
the theory of functions but the whole of pure mathematics.
A function is essentially a correspondence, whereby a
value jjq of the dependent variable y is associated with any
arbitrarily chosen value xq of the independent variable x,
and it is thus uniquely determined by the totality of all such
associated pairs of numbers {xo, yo). If we denote by R the
totality or set of all (real) numbers, and by R X R the set of
all ordered pairs (x, y) of two real numbers x and y, every
function / gives rise to a certain subset F of R X R, namely
the set of all ordered pairs (x, y) such that y = / (x). And
conversely, every subset F of R X R with the property that,
for every number x, there is precisely one number y such
that the ordered pair (x, y) belongs to F, determines a
function /. The concept of function is thus definable in
set-theoretic terms; and the same is true in principle of
every other mathematical concept — though it may well be
that the set-theoretic definition of what appears to be a
simple intuitive concept is formidably elaborate*
In discussing the concept of function we have taken the
concept of number for granted, but naturally this concept
also must be made independent of intuition by abstract
set-theoretic definition. How this is done is, in fact, another
story, and by no means a short one. I have already referred
SETS AND FUNCTIONS 7
to Dedekind's theory of real numbers, which provides the
modern answer to Pythagoras's problem of the irrationality
of such numbers as j/2; and this, or some equivalent theory,
is needed as part of our analysis of the number system.
Dedekind defined real numbers in terms of rational numbers
(i. e. ratios or fractions plq) and there is no difficulty in
defining the rational numbers in their turn in terms of the
'natural numbers" 0, 1, 2, — If we can once introduce the
natural numbers in a satisfactory way, then all is relatively
plain sailing.
Detailed examination of the use that is made of the
natural numbers in mathematics reveals that all that needs
to be known concerning these numbers is that they constitute
a progression. In other words, we can generate the entire
system of natural numbers progressively by starting with
and repeatedly passing from n, the last number so far
obtained, to its successor n', i. e. the number n + 1. One way
of introducing the natural numbers into mathematics — a
way favoured by both Dedekind and Peano — is to postulate
the initial number and the successor operation as given,
and then to adopt axioms which confer on the set of natural
numbers the structure of a progression. This is precisely
the function of the well-known Peano axioms for the natural
numbers:
(1) is a natural number.
(2) If n is a natural number, the successor n of R is also
a natural number.
(3) No two natural numbers have the same successor.
(4) is not the successor of any natural number.
(5) If P is a property such that (a) has the property P,
and (b) if n has the property P then so also has n, then
every natural number has the property P. (The axiom
of mathematical induction.)
This axiomatic way of introducing the natural numbers
s EXPLORING UNIVERSITY MATHEMATICS 1
is completely satisfactory for mathematical purposes. But if
we do not wish to postulate special concepts such as those
of zero and the successor operation in addition to the
general concepts of logic and set theory, we can adopt the
alternative course of defining a particular progression in set-
theoretic terms and then using this always to stand for the
progression of natural numbers — which it can perfectly well
do since the natural numbers have no mathematically signi-
ficant features beyond that of being a standard progression.
In fact we use the standard progression very much as the
progression 0, 1, 2, . . ., 10, 11, . . , of numbers in the scale of
ten is used in more intuitive mathematics?.
The customary definition of the natural numbers as sets
runs as follows. We denote by the empty set (i. e. the zero
totality which has nothing at all belonging to it); and if
a.b,...,k are given entities we denote the set of these entities
bv {a, b, . . ., fc}. Then we may form the progression of sets
0, {0}, {0.{0», {0,{0} J {0>{0>}}'--"
by taking first the empty set 0, and then at each stage
introducing a further set whose members are all the sets
previously introduced. The sets of this progression are then
defined to be the natural numbers, and from this point on
they are represented by the familiar symbols 0, 1, 2, 3, —
Such, in brief, is the transformation that mathematics has
undergone as the result of the criticism to which its founda-
tions have been subjected in modern times. And such is the
resulting conception of mathematics that is now presupposed
as a matter of course in current research in this field.
References for further reading
KNEEBONE, G.T., Mathematical Logic and the Foundations of
Mathematics, London 1963 (especially Chapter 5).
ROTMAN, B. and KNEEBONE, G. T., The Theory of Sets and Trans-
finite Numbers, London, 1966.
CHAPTER 2
Special Relativity
and some Applications
J.R.Ellis
The special theory of relativity was proposed by Einstein
in 1905. During the early days of relativity, his work went
under a rather different heading from the title we give it
today. The original paper (Ann. Phys. Lpz. 17) by Einstein
was called "Electrodynamics of moving bodies". The subject
has since been called the special theory of relativity, or
the restricted theory, to differentiate it from his second
theory, the general theory, which he wrote in 1915. The
latter theory was a theory of gravitation, based on concepts
he introduced in his special theory. While the general
theory could never be regarded as a mere application of
the special theory, the special theory has, nevertheless, many
direct applications, so much so in fact, that this theory is
recognized without question as a corner-stone of modern
physics.
Mathematically, the special theory of relativity is really
a geometry: a certain kind of geometry of four dimensions
which connects three spatial coordinates x, y, z and a time
coordinate t. It is not surprising that it is still important in
the development of present theories. Relativity is construed
as an essential part of physics, and all theories must fit in
with this four-dimensional framework and obey the so called
relativity principle.
9
10 EXPLORING UNIVERSITY MATHEMATICS 1
In order to arrive at this four-diinensional geometry which
has just been mentioned, and to explain the relativity prin-
ciple, we must first go back to Newton, or at least his
conception of space and time. His own framework, on which
he founded his three laws of motion, supposes the existence
of an absolute euclidean three-dimensional space and an
absolute universal time, which is common to all observers
independent of their position or motion in space. Space and
time are separate entities. They can be envisaged indepen-
dently of one another.
Suppose a particle moves with uniform velocity in one
dimension (which we may take to be the x-axis). In a
space-time diagram, the motion is recorded as a straight
line (Fig. 2.1).
We can also record a two-dimensional motion in a space-
time diagram without difficulty. For instance, if a particle
moves in a circle with constant speed, the space time diagram
turns out to be a helix (a curve in the shape of a spiral
advancing along an axis) (Fig. 2.2).
However, when we try to represent three-dimensional
motion in a space-time diagram we get into difficulty. We
should need to go into four dimensions to represent it. But
SPECIAL RELATIVITY AND SOME APPLICATIONS
t
11
•-y
Fie. 2.2.
this is still mathematically feasible; and in fact we often
do work with a space of four dimensions in this way,
although it is not possible to represent the situation dia-
gramatically. The kind of geometry to use between x, y, z, t
now becomes an important question.
In classical mechanics, space-time diagrams are merely
representative diagrams, although we can imagine them
based on euclidean geometry by a suitable definition of the
word "graph", without any loss. However, for the purpose
of bringing the work closer to Einstein, we must be prepared
to read a little more into the geometrical relationship con-
necting x and t and to acknowledge that in the end it may
turn out not to be euclidean.
But let us first of all see how, on the basis of euclidean
geometry between x and t in Fig. 2.1, the newtonian picture
represents, for instance, the motion of light, in one dimension.
Light travels with a constant velocity of approximately
3 X 10 10 cm/sec, in a straight line. The magnitude of this
velocity is commonly denoted by c. It would be useless to
try to draw a line representing the motion of a light pulse
on the space-time diagram of Fig. 2.1, since the slope of the
line would be
t 1
x ~ 3 X 10 10 '
12 EXPLORING UNIVERSITY MATHEMATICS 1
and this would be so small that the line would be indistin-
guishable from the axis of x. This, of course, would not
conflict with Newton's opinion that light travelled instanta-
neously, but rather than accepting this, since we now know
otherwise, let us agree to multiply the units in which the
time t is measured, by c. Then the motion of a light pulse
moving along the #-axis would be represented by a straight
line with unit slope [Fig. 2.3 (a)] . We are now measuring time
T (= ct) in light-seconds (the distance in centimetres tra-
velled by light in one second). A light-year (the distance
travelled by light in one year) is approximately equal to
3- 16 X 10 7 light-seconds.
I (=cl)
\
Fig. 2.3.
The dotted line in Fig. 2.3 (a) indicates the motion of a
light pulse emitted in the opposite direction from O. It is
not difficult to see that a ray of light emitted in both
directions from O for a finite time interval between T =
and T = a (say) would be represented as a region of the
space-time diagram [Fig. 2.3 (b)].
If a ray of light shines in both directions for an infinite
time from T = to T = °o (a = °o) the corresponding region
is easily obtained [Fig. 2.4 (a)]. This part of the space-time
SPECIAL RELATIVITY AND SOME APPLICATIONS
13
diagram is called the future region from O, because all points,
or rather events (x, T), within it can be arrived at from O
without having to travel at a speed greater than that of
light (the slope of all lines from O to points in the region
is numerically greater than unity). It is, of course, impos-
sible, from a physical point of view, to have speeds greater
than that of light. The future region thus constitutes the
totality of all events which are accessible from O.
FIG. 2.4.
In a similar way, the region bounded by light pulses tra-
velling towards the origin is called the past region from O
[Fig. 2.4 (b)], and represents the totality of all events which
may communicate with O.
The remaining region is called the present region with O
and is made up of all those events which are neither acces-
sible from O nor communicable with O. We then have the
complete picture, built up from light pulses travelling
towards and away from the origin [Fig. 2.5 (a)].
The terminology "past", "present", "future" is used in anti-
cipation of special relativity, where it is quite common. At
the moment, the terms past, present, future are relative only.
They are entirely dependent on the fact that O is at rest
with respect to absolute space.
If we consider a three-dimensional space-time diagram,
representing a two-dimensional motion in the plane of x and
14
EXPLORING UNIVERSITY MATHEMATICS 1
T
Fig. 2.5.
y, the past, present and future regions of this diagram are
separated by an infinite cone, called the light-cone, whose
semi-angle has unit slope [Fig. 2.5 (b)]. Any cross-section of
this diagram through the time-axis brings us back to a two-
dimensional diagram [Fig. 2.5 (a)].
For four-dimensional space-time, the same terminology is
used, although the light-cone in question is no longer a two-
dimensional surface but a three-dimensional one.
Fig. 2.6.
SPECIAL RELATIVITY AND SOME APPLICATIONS
15
Before we introduce the idea of relativity, we have a fairly
good application of the things we have mentioned so far,
which we might consider for a minute: the past history of
the universe. Let us choose coordinates by supposing that
our Milky Way, the nebula which we, in the solar system,
inhabit, is at the event x = 0, y = 0, z = 0, T = 0. We take
a cross-section in the ^-direction and represent the situation
as shown in Fig. 2.6. Consider the light L which travels to
us from the distant parts of the universe, in fact, from the
distant nebulae, labelled by the iV's. The distant nebulae
are understood to be receding from us with speeds which
vary in proportion to their distance from us.f If we assume
that the nebulae have always been moving with uniform
speeds equal to their present speeds, extrapolation of their
path lines in the space-time diagram will reveal that they
appear to emanate from one common event in the past. This
event is at (0, — cE) where H = 4-1 X 10 17 sec (ot 1-3 X 10 10
years). (It will be recalled that we are measuring the time
T in light-seconds.) It is as though all the nebulae were
packed tightly together 1-3 X 10 10 years ago and then moved
apart as though an explosion had occurred there. It is custo-
f A galaxy's speed (the word "galaxy" is synonymous with "ne-
bula") can be estimated by analysing the spectrum of the light
received from it. The so called "red-shift" — the apparent shift
of the entire spectrum in relation to known patterns of absorption
lines (which indicate the presence of certain elements within the
vicinity of the galaxy) — is interpreted as a Doppler effect, the
diminution of frequency which is observed when a source is mov-
ing away from us. The effect is well known in the study of
sound vibrations. The sudden drop in the pitch of a whistle of
a train as it rushes past us is a common example of this, involv-
ing a raised frequency (advance) and a lowered frequency (re-
cession). Similar principles apply equally well in the case of
light waves, e. g. an ordinarily green light when travelling with
a speed of about c/5 would appear blue if travelling towards us,
and red if travelling away from us.
16
EXPLORING UNIVERSITY MATHEMATICS 1
SPECIAL RELATIVITY AND SOME APPLICATIONS
17
mary to say that H, on the basis of this very naive picture,
is the age of the universe, f
It is interesting to compare this figure, which is arrived
at from a purely optical approach, with the age of the earth
determined from the analysis of radioactive rocks. This
age is about 2-6 X 10 9 years, possibly higher. (At least on
the basis of the space-time model the age of the earth is not
greater than the age of the universe!)
We return to the earth to describe a certain relativity of
motion which was known to Newton. So far the measure-
ment of relative motion has not been mentioned. This is
precisely where relativity comes in. We shall see that new-
tonian relativity, as it is sometimes called, fails in its attempt
to describe relative motion and moving sources of light. The
reasons for its failure lie in the assumption of an absolute
space and an absolute time.
Newtonian relativity is based on the equations
x = x - Vt, y = y, z' = z,
which connect two frames of reference: (x, y, z) which is
fixed in absolute space, and (x\ y', z') which is moving in
the ^-direction with constant velocity V (Fig. 2.7).
Newton simply observed that his famous law of dynamics
force = mass X acceleration
docs not change its character when we pass from the first
coordinate system to the second. This is because the law
is independent of velocity. The essence of this statement
is that if we perform a dynamical experiment when we are
travelling with a constant velocity V with respect to the
t Cosniological theories which envisage a definite point of origin
usually incorporate II as the age of the universe. There are other
cosmological theories which envisage no definite origin. A dis-
cussion of each type of theory is unfortunately beyond the scope
of this lecture.
earth, the result we get would be the same as that produced
by an identical experiment at rest on the ground (assuming
for the moment that the earth is a good enough representation
of absolute space: we shall come to this shortly). Velocity,
dynamically speaking, is not an absolute quantity, but purely
relative, whereas acceleration is absolute
Since Newton's second law is referred to as the "law of
inertia", a coordinate system in which the law is valid is
called an inertial frame. By the principle of newtonian
relativity, an absolute coordinate system can be substituted
by any one of a whole set of inertial frames; the significant
point is that no one of these frames is more important than
any other.
On the basis of newtonian relativity (which it will be
remembered will fail in its attempt to describe kinematics
satisfactorily), let us see how the moving frame of Fig. 2.7
2 Z'
Fig. 2.7.
records a light pulse emitted in the usual way from O in the
fixed frame. Consider the situation in the common x-x'
direction for simplicity, and set up two space-time diagrams,
one for the fixed frame and one for the moving frame.
Fig. 2.8 shows how each frame "sees" the light pulse.
It is clear that a person moving with the origin O' will see
the light pulse travelling with a velocity c — V in the forward
direction and c + V in the reverse direction owing to the
velocity V of O' with respect to O. 0"s conception of past
18
EXPLORING UNIVERSITY MATHEMATICS 1
SPECIAL RELATIVITY AND SOME APPLICATIONS
19
and future will be different from that of O; and, in general,
any optical experiment whicli is carried out in the moving
frame will give a different result to that of a similar
experiment carried out in the fixed frame. Hence the
principle of newtonian relativity holds for dynamics, but not,
it seems, for optics. Because of this, and because the motion
of light is always uninfluenced by the motion of its source.t
T (=ct)
T (=ct)
x = cl
x'= - <c+V)1
X'= (C-V)t
Fig. 2.8.
it should be possible to measure the earth's velocity with
respect to absolute space by doing a suitable optical
experiment on the earth. Such an experiment was carried
out by Michelson and Morley in 1887. They set themselves
the task of finding the velocity of the earth with respect to
the stationary luminiferous aether which in their time was
supposed to characterize and fill the absolute space (or one
of the dynamically equivalent inertial frames) devised by
Newton, which had since become so crucial from the point
of view of optics. Their experiment, it could be said, was
designed to detect an "aether wind".
t This fact emerges in the observation of "double stars", i. e. two
stars moving in orbits around their common centre of mass. If
light took longer to reach us from one star than from the other
(when the first was receding and the second approaching) this
would give rise to anomalies in the observed motions of these stars
(viz. under simple velocity addition, circular orbits would appear
eccentric), and no such anomalies have been found.
This famous experiment of 1887 was a refinement of a
previous experiment which Michelson had carried out in
1880 (at that time the sensitivity of the apparatus he used
made the result unreliable). Michelson's idea was very
simple. It was to send two beams of light from the same
source, fixed to the ground, in two mutually perpendicular
directions and over equal distances, and then to reflect them
back again into the same straight line and measure any
difference in the arrival times of the light waves by observing
interference fringesf (Fig. 2.9).
A = glass plate, half-silvered
on its rear surface.
M,= mirror, reflecting first beam.
B = compensating glass plate
having the same thickness
as A. This is employed so
that the two beams have
equal path lengths in glass.
M, = mirror, reflecting second beam,
LIGHT
SOURCE
",-^j„ o
Fig. 2.9.
The experiment can be compared with a simple problem
in relative velocity. It is not difficult to show that it always
t When two light waves arrive in step with thedr crests together
they reinforce one another and produce a bright fringe (construc-
tive interference); when they arrive out of step cancellation takes
place and a dark fringe is produced (destructive interference). In
practice any difference in the arrival times of the two beams in
Michelson's apparatus could be most easily discovered by rotating
the whole apparatus through 90° when a fringe shift should be
produced, since the roles of the mirrors in the experiment would
be changed.
20
EXPLORING UNIVERSITY MATHEMATICS 1
takes longer to fly an aeroplane at a certain rate up-wind
and back, over a certain distance, than it does to fly at the
same rate perpendicularly across-wind and back, over an
equal distance. (The proof is given in an appendix, but the
reader should attempt to prove it for himself.) Correspon-
dingly, for Michelson's experiment, if an aether wind
prevailed, one light beam must travel up-wind and back
while the second must travel across-wind and back, and the
first beam will arrive later than the second.
The result of the Michelson-Morley experiment was quite
sensational. The two arrival times were the same. No aether
wind existed. The experiment was repeated when the earth
was in a different position in its orbit round the sun and the
same result was obtained. Again no aether wind.
The problems posed by the unexpected result of this
experiment led indirectly to Einstein's theory of relativity.
Einstein established the idea that the geometry of space-time
diagrams (so to speak) was non-euclidean. In this way he
was able to revive the principle of newtonian relativity for
dynamics and extend it to optics (and electromagnetism) as
well, and in doing this, the old idea of absolute space and a
luminiferous aether came to be abolished.
In short it was supposed that the space-time diagrams of
Fig. 2.8 were basically incorrect and ought to be replaced by
identical pictures on the basis of the null result of the
Michelson-Morley experiment (Fig. 2.10).
T (=cl)
T'(=cl')
Fig. 2.10.
SPECIAL RELATIVITY AND SOME APPLICATIONS 21
It was precisely the old transformation rules
x' = x — Vt, y' = y, z — z
which embodied the idea of euclidean three-dimensional
space and absolute time. Instead, new transformations were
given, in which the time T (= ct') of the moving system was
no longer the same as T of the fixed system, thus dropping
the idea that time was something absolute.
The light pulse x = T had to be the same as the light pulse
x = T', and x = — T the same as x' = — V . This was
certainly true. (We follow an argument which Einstein, or
his contemporaries, might have considered.)
In addition there was no harm in tentatively assuming that
the new transformations had an essentially simple character,
and it was therefore supposed, on the grounds of simplicity,
that
x' -T' = h (x - T), (1)
x + T = I (x + T), (2)
where k and I were constants to be determined. It was
reasonable to choose kl = 1 from the point of view of units
in the two systems. This assumption led to the equation
x
.'2 _ T'Z = X 2 - J**
[multiplying (1) and (2)] and this equation also contained the
geometrical aspect of the transformation: the quantity
x 2 — c 2 1- should not change its value on transformation from
one coordinate system to another. Physically this meant that
the light pulse must present identical pictures to the two
frames. (If it had turned out that x 2 + c 2 1 2 were the
invariant, then the geometry between x and T would have,
indeed, been strictly euclidean, by Pythagoras.)
22
EXPLORING UNIVERSITY MATHEMATICS 1
SPECIAL RELATIVITY AND SOME APPLICATIONS
23
Adding (1) and (2), and subtracting (1) and (2) gives two
further equations:
h+ l) X -[ k -l) T > 2T ' = [ k + l
k ~i u
or
fc 2 + l
x =
2k
fc 2 -l
x ~k^l T
fc 2 + 1 r k 2 - i
r = — — lr-
2k
k*+i
The spatial origin x = must be equivalent to x = Vt (or
x = VT/c), since it is travelling with speed V in the fixed
frame (the diagram of Fig. 2.7 still applies). The last
equation therefore gives
fc 2 -l
V
c
an
d h
k* + 1
1
ence
= fi (say).
A: 2 + 1 c ' 2 k ]/(i- Wc 2 )
Thus the rules of transformation are
x' = (S (x - VT/c), T' = f}(T- Vx/c), (y' = y, z m z),
or
x' = p(x-Vt), y' = y, z' = z, f = fi (t - Vx/c*), (3)
where
£ = 1/1/(1 - VVc 2 ).
These are the Lorentz transformations of special relativity.
It is with these transformations that one associates the
principle (really a postulate) of special relativity: "no
experiment can ever detect a uniform motion through space."
The principle is entirely equivalent to the statement that the
laws of nature possess the same form in all frames of
reference in uniform motion relative to each other. In
particular, the laws of dynamics and the laws of electro-
magnetism (including optics) must satisfy (and can be shown
to satisfy) this requirement.
The equations of transformation (3) themselves produce a
variety of consequences apart from the properties of
invariance which we have mentioned.
(i) When the velocity V between the frames (x, y, z),
(x' t y', z') is small compared with the velocity of light, as will
be the case for most terrestrial applications, we find that the
Lorentz transformations (3) reduce to
x'=x-Vt, y' = y, z = z, *'==* (0=1).
These are the old transformation rules, formerly based on
the notions of absolute space and absolute time (f = t), but
now to be regarded as approximations.
(ii) If the equations (3) are solved for x, y, z, t in terms of
x', y' , z ', f, we find
x = p(x' + Vt'), y = y', z = z, t = p (f 4- Vx'/c*)
and this has the same form as (3) except that V is replaced
by — V. This demonstrates the perfect symmetry between
the two frames. No one frame is preferred and no one
frame is fixed in any absolute sense.
(iii) Simultaneity has no absolute meaning in relativity.
If two events (*i, y\, z\, t\), (x2, y2, z& t%) are simultaneous to
an observer, i. e. t\ = t^, they will no longer be simultaneous
to another observer moving with a velocity V, for it will not
be the case that t\ = t{, by virtue of the Lorentz transfor-
mations:
t{ = P (h - Vx x lc*), U = fi{h- Vxs/c*).
These times are unequal. This follows because we have
xi =|= X2- (If x% = X2 the two events would be the same and
questions of simultaneity do not arise.)
(iv) The combination of two successive Lorentz transfor-
mations is still a Lorentz transformation. Suppose a first
frame is related to a second by the equations (3):
24 EXPLORING UNIVERSITY MATHEMATICS 1
x' = 0(x-Vt), y' = y, z' = z, t' = fi (t - Vx/c 2 ),
iff = 1/1/(1 -V 2 /c 2 ),
and the second related to a third by a velocity V:
x" = P'{x'-V't'), y" = y', z" = z', t" = & (*' - V x'lc 2 ),
F = 1/1/(1 - V'W),
then, by eliminating x', y', z', t' between the two trans-
formations, it is not difficult to show that the third frame
is related to the first frame by the equations
^ = /5 "( x -P't), y" = y, z" = z, t" = 0" (t - V" xlc 2 ),
where V" (as it turns out) is not merely the addition of the
two velocities V+V' as one might at first be tempted to
imagine, but is given by a formula
V" = {V + V')l{i + VV'/c 2 ),
which approximates to V + V' when the velocities are small
compared with that of light.
It is interesting to notice that if V = c (or V = c) then V"
is also equal to c, verifying that the velocity of light can
never be increased by "adding" a further velocity to it.
£, e,
V
Fig. 2.11.
SPECIAL RELATIVITY AND SOME APPLICATIONS
25
(v) Moving measuring rods appear to contract. This is a
curious phenomenon, but it must be a consequence of rela-
tivity. Suppose a rod of length I is fixed along the x'-axis
of the moving S' system (Fig. 2.11) between the points
x' = Xi' and x' = x-z' (so that x% — x{ = I).
If the situation is viewed from the frame S at any time t
say, the length of the rod x-2 — xi as it appears in this frame
can be found from the following formulae:
xi = fi (*i' + Vh% t = [it 1 + Vxi'ft?).
giving
xi = j3xi' + V(t- fiVx'i/c*) = fi (1 - We*) x\ + Vt,
i. e.
xi = Xl ' V(l -VW) + Vt.
Similarly,
x 2 = x 2 ' 1/(1 -V 2 /c 2 ) + VU
therefore
x*-xi = {x 2 ' - xi') 1/(1 - F 2 /c 2 ) = IV{1- V 2 /c 2 ).
Hence the rod appears to be shortened by a factor
V (1 — V 2 /c 2 ). This is known as the Fitzgerald contraction,
(vi) Moving clocks appear to run slow. This is an equally
curious phenomenon. Suppose a clock is fixed in the frame
S' (Fig. 2.12) at the point (x', 0, 0). Let t % ' and U be the
times of two consecutive "ticks" that it makes, and the
interval between the ticks t% — i\ — p sec. When the
-X--SF
7
—i
Fig. 2.12.
26
EXPLORING UNIVERSITY MATHEMATICS 1
SPECIAL RELATIVITY AND SOME APPLICATIONS
27
situation is seen from the frame S the interval is no longer
p sec, for
h = fi {h' + Vx'/c 2 ), t 2 - P W + Vx'/c 2 ),
and so
h-h = P W - W) = P/V (1 - VV<*). (4)
Hence moving clocks "appear to run slow" by a factor
1/1/(1 — V 2 lc 2 ): this is called the time dilatation effect.
The phenomenon has very recently been directly confirmed,
experimentally, to within a factor of about 2 per cent.f
The technique adopted did not make use of the ticks of a
moving clock, but of the very short waves produced by a
certain moving gamma-ray source (following the discovery
of R. L. Mbssbauer, which enables us to use certain atomic
nuclei as very accurate frequency standards). The theoretical
principles involved are very much the same as if a moving
clock had been used, but, of course, the success of experi-
ments on time dilatation depends on the existence of ex-
tremely precise methods.
The source was mounted at the circumference of a rotor,
which was set spinning with a very high angular velocity.
The radiation from the moving source was received at the
centre by an absorber.
Since the frequency of the radiation is inversely propor-
tional to its period, we have
where t a , t s are the periods, and v a , v s are the frequencies of
the received and emitted radiation respectively. The velocity
V of the source equals cor, where a) is the angular velocity
f Champeney, D. C, Isaak, G. R., and Khan, A. M., Proc. Phys.
Soc. 85, 5S3 (March 1965). (Their article also contains references
to other recent experiments.)
of the rotor and r is its radius. Thus by the binomial
theorem,
v a co 2 r 2
-==i-a— -.
V» c z
(In view of the magnitude of c, higher order terms may be
neglected.) This theoretical prediction was verified.
Finally, we consider some dynamical consequences of
special relativity.
If a particle of mass m, fixed in a frame S', is moving with
a velocity V with respect to a frame S, Einstein supposes that
the mass which an observer in S would measure is not m but
mtV (i - VVc 2 ) = M (say),
rather like the time dilatation (4); m, being the mass which
an observer at rest relative to it in S' measures, is called the
rest-mass. The quantity M is called the relative mass, and
correspondingly MV, the momentum which S measures, is
called the relative momentum.
In the absence of force and non-dissipative impact, total
relative mass and total relative momentum are conserved.
These basic principles of relativity mechanics can be illu-
strated in the following dynamical problem: suppose two
spherical perfectly elastic particles move along their common
line of centres and collide. If M\ and M% are their relative
masses, and V\ and Vo are their velocities before impact,
and the same symbols with dashes denote the respective
quantities after impact, the assumption that total relative
mass and total relative momentum are conserved yields the
equations
M ! + M 2 = M x ' + M 2 ', M, Vt + M 2 V 2 = Mi 1 VS + M 2 V 2 .
28
EXPLORING UNIVERSITY MATHEMATICS 1
If V\ and V% are known, the velocities V\' and V 2 ' after
impact can be determined, since we have the auxiliary
equations
Mi = mi/ 1/(1 -F?/c 2 ),
Mi = mi/1/ (1 - Vf/c 2 ),
M 2 = m 2 /]/(l -F 2 2 /c 2 ),
M 2 ' = m 2 7l/(1 -V 2 ' 2 I<?).
There are two solutions. One of these is obviously V\ = V\,
Vn = ^Y corresponding to the case of no collision. The other
is the required solution for collision.
In the approximation when V\ and V 2 are small compared
with c, the equation for the conservation of mass gives, by
the binomial theorem,
m (1 + &F1W) + m 2 (1 + W 2 ~Ic 2 )
= m (1 + hVi'Vc 2 ) + m 2 (1 + W 2 ' 2 lc 2 ).
If we multiply this equation through by c 2 , we obtain the
usual equation for the conservation of energy:
hmi V? + hm 2 Vf = imi Vt* + hm 2 V 2 ' 2 .
This shows that the conservation of mass and of energy are
equivalent and related by E = Mc 2 , the famous Einstein
formula. We can see this also by expanding Mc 2 :
E = Mc 2
= mc 2 / ]/(! - V 2 /c 2 ) = mc 2 + hmV 2 + higher terms in Vic.
rest kinetic
energy energy
further
correctioms
In special relativity, Newton's second law is interpreted as
force = - (MV),
at
SPECIAL RELATIVITY AND SOME APPLICATIONS
29
where M is the relative mass. What change does this new law
make to existing dynamics? Virtually none for terrestrial
phenomena since the velocities encountered are so small
compared with the velocity of light. However, the theoretical
difference the law makes to the motions of the planets round
the sun, in relation to their calculated positions according
to newtonian mechanics, is significant in the case of the
planet Mercury, being within the fringe of what is measurable.
It can be shown that on the basis of the special relativistic
form of Newton's second law, the orbit of Mercury is not
an ellipse with the sun as focus as classical mechanics
determines, but an ellipse which is slowly but constantly
rotating about the sun as focus, i. e. an elliptical rosette.
The rate of rotation, given by these calculations, is found to
be 1\ sec of arc per century, which is an extremely small
quantity. Observationally, Mercury does possess this
annomaly in its orbit, but the rate of rotation is, in fact,
approximately six times as much as special relativity predicts.
The added precession is taken to be a consequence and a test
of Einstein's second theory, his general theory of relativity,
and this derives for Mercury the observed precession to
within about 1 per cent.
Appendix
To show that it takes longer to fly an aeroplane at a certain
rate up-wind and back, over a certain distance, than it does
to fly at the same rate perpendicularly across-wind and back,
over the same distance.
Let us suppose that the rate at which the aircraft travels
in still air is c m. p. h. If the wind-speed is V m. p. h.
however, it is clear that the aircraft's speed is then c — V
when it is travelling up-wind, c + V when it is travelling
down-wind, and V (c 2 — V 2 ) when it is travelling across-wind
(in either direction).
30
EXPLORING UNIVERSITY MATHEMATICS I
If we take the first case when the aircraft flies up-wind
over a distance of d miles (say) and returns over the same
distance, the total time of flight is
d d
+
c-V c+V
hours.
In the second case when the aircraft flies across-wind and
back, the time of flight is
2d
V(c*-V 2 )
hours,
the time of the outward and return journies being in this
case equal.
Provided F#=0, the first time is always greater than the
second: the inequality
2dc
>
2d
c z -V 2 V{c*-V*)
is valid because c> V (c* — V 2 ).
CHAPTER 3
Some Properties
of Integers and Primes
S. J. Taylor
Very early in life we learn to count and we build up a
store of experience of the "whole numbers". We learn that
we can add them, multiply them and sometimes subtract or
divide, and that there is always a correct (or unique) answer
for these operations. Our object in this lecture will be to
try and formalize some of these notions and to see that many
of the tricks of manipulation which we learnt to use withoui
fully understanding them can be justified.
Laws of algebra
We assume that we know the set Z = {0, 1, 2, 3, . . . } of
whole numbers, and that the operations of addition and
multiplication give a unique answer and satisfy
a + b = b + a, ab = ba;
(a + b) + c = a + (b + c), {ab) c = a (be):
a (b + c) = ab + ac;
a + = a, a • 1 = a for all a ;
if a=^0 and ab = ac, then b = c.
It is worth remarking that we could derive these "laws" for
31
52
EXPLORING UNIVERSITY MATHEMATICS 1
Z from a more primitive set of axioms, but this would take
much too long. Further, one can deduce other simple laws
from the above. For example, the last law implies that the
product of two integers a, b in Z can only be if at least
one of them is zero.
Order
In learning to count we soon learn that some (finite) sets
are bigger than others. We formalize this by saying that,
of two unequal integers in Z one must be larger than the
other. We write a<Cb for "the integer a is smaller than
the integer b". The order structure of Z satisfies:
SOME PROPERTIES OF INTEGERS AND PRIMES
There is no integer between and 1.
55
if a < b, then a + c < b + c
and ac <. be
if a < b and b < c, then a < c;
< a for all a #= 0.
for all c,
for all c=t=0;
In fact many of the results of mathematics depend for
their validity on the fact that the ordering of Z satisfies a
stronger condition.
Axiom. The ordering of Z is such that every non-empty
subset E of Z contains a smallest element* that is, there is
an element m e E with m <, x for all x in E (any ordered set
satisfying this condition is said to be well-ordered)*
It is interesting that this axiom cannot be deduced from
the laws of algebra together with the fact that Z is ordered.
The reason for its importance is that it implies the validity
of the principle of induction. Before stating this formally
let us make an easier deduction:
For suppose that there were at least one c in Z with
< c <C 1, then the set E of such c would be non-empty. By
the well-ordering axiom E has a least member m and
< m <C 1. But then, multiplying this inequality by m, we
obtain < m 2 < m < 1 so that m 2 also belongs to E and is
smaller than m. This contradiction establishes the truth of
the statement.
Mathematical induction
We must start by formulating carefully the method of
induction which we use frequently. An essential ingredient of
such a proof is that we have a series of statements depending
on the integer n belonging to Z. If H n is such a statement
and
(i) Hj t is true,
(ii) the truth of H n implies the truth of H rt +i for every
n in Z such that n > k;
we want to deduce that H n is true for every n ]> k. For
example H n could be the statement
2«>n;
by proving (i) and (ii) for this statement with k = we would
like to say that it is true for every integer n in Z. If we
call this method of proof the principle of induction, then
The principle of induction is valid.
For. suppose the conditions (i), (ii) are satisfied. Let E
be the subset of Z consisting of those integers n > k for which
H n is false. If E is not empty, it has a least element m.
Then H m is false so, by (i), m=t=fc. Hence m>k, so that
m — 1 > k, since 1 is the smallest positive integer. It follows
34
EXPLORING UNIVERSITY MATHEMATICS 1
that m — 1 is not in E so that H m -\ is true and this implies
the truth of H m by (ii) with n = m — 1 > k. But H m cannot
be both false and true, so our assumption that the set E is
not empty is not possible. Hence E is empty, so that H„ is
true for all n > k.
We deduced the principle of induction from the axiom of
well-ordering: it is worth remarking that one could also
have assumed the induction principle as an axiom and
deduced that the usual ordering of Z is a well-ordering.
In order to illustrate the need for care in applying the
principle of induction, let us "prove" the following.
False Theorem. All the people in any room have the same
colour of hair.
Let E n be the statement that, if a room contains n persons,
they all have the same colour of hair. Since every room
must contain a whole number of people it is sufficient to
prove that H n is true for every n ]> 1 in Z. Applying the
principle of induction (which we have shown is valid) with
k = 1, then (i) Hi is true, for in a room with only one
person he (or she) has the same colour hair as himself. Now
assume the truth of H n and consider any room containing
(n + 1) persons. Number these inmates from 1 to (n + 1)
and first send person number 1 out of the room. The room
now contains n people and by hypothesis they all have the
same colour hair. Call number 1 back and send out person
number (n + 1). The room again has n people in it — all
with the same colour of hair. But now person number 1
must have the same colour of hair as persons 2 to n and
the same is true of person number {n + 1). It follows that
all (n. + 1) persons have the same colour of hair, and we have
deduced the truth of J? n + i and established the second
condition of the method of induction. By induction H„
must be true for every n.
SOME PROPERTIES OF INTEGERS AND PRIMES
35
Question. What is wrong with the above proof?
Hint. Check that condition (ii) is satisfied for every n > 1.
Divisibility
If a, b are in Z we know that the equation ax = b does
not always have a solution in Z. When the equation has a
solution it must clearly be unique: in this case we say that
b is divisible by a or "a divides b" and write a | b. In general
it is possible to have a | be without either a\b or ale.
However, there is a special class of integers a for which this
deduction is valid and Theorem 2 below will establish it. It
is more convenient to define them by a different property.
Definition. A prime is any integer p in Z other than or
1, such that the only integers a in Z which divide p are 1
and p.
Everyone is familiar with the first few primes:
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, . . .;
tables have been prepared listing all the primes up to
10 7 and some very large numbers are known to be primes.
However, there is a very neat proof, due to Euclid, of the
theorem:
There are infinitely many primes.
Suppose this is false; then the set of primes is finite and
they can be listed pi, P2, ■ • •, Pn- Consider the integer
i = PiP2- • • Pn + 1- This cannot be divided by any of the
primes pi, pi, . . ., p n - But t is bigger than any of p\, p% p n
so it cannot be a prime. But any number which is not a prime
must be divisible by a prime, since the smallest divisor
greater than 1 cannot have divisors other than itself and 1.
It follows that t must be divisible by some prime other than
36
EXPLORING UNIVERSITY MATHEMATICS 1
pu ...,p n and we again have contradicted the assumption
that these are all the primes.
We will return later to discuss some of the known (and
unknown) properties of primes. But first we have to
continue our more general discussion of divisibility. If a
and b are in Z {b 4= 0) then we can always divide a by b to
give a quotient q and remainder r which is smaller than b.
This can be formulated as a theorem.
Euclidean Algorithm. Given integers a, b in Z with b>0
there exist unique integers q, r in Z with
a = bq + r, < r < b.
We first prove uniqueness. Suppose q it r i} q 2 , r 2 satisfy
this relation, then
so that
a = bqi + ri = bq 2 + r 2
& I Qi ~ Q2 I = I n — r 2 I .
Thus b divides I n — rg I which is less than b, so we must
have ri — r 2 = 0, whence qi = q 2 . Now let E be the set of
integers of Z of the form a - bx, with x in Z. E is not
empty since x = gives the integer a in Z. By the well-
ordering axiom the set E has a least member r corresponding
to an integer q in Z. Then a = bq + r and we have only to
show that 0^r<b. U r ^ b, then a - b (q + 1) would be
a smaller integer in E contradicting the definition of r.
Hence the relation is completely established.
Greatest common divisor
Given two integers a, b in Z we call cf the g. c. d. of a and
b if d is a divisor of both a and b, and any x in Z which
divides both a and 6 also divides d. Thus
some properties of integers and primes
d\a, d\b and x I a, re I £> imply .x I d.
37
Note that the adjective "greatest" really means that d is a
multiple of any other divisor.
Theorem 1. Any two positive integers a, b have a unique
g. c. d. in Z, denoted (a, b). There are integers u, v (not
unique, and not both positive) such that
(a, b) = ua + vb.
This can be proved by repeated application of the
euclidean algorithm. (I believe that it is usual nowadays
to teach people to find (a, b) by factorizing a and b into
prime factors — this is illogical as the method can only be
justified if one has already proved this theorem or something
equivalent.) Applying the algorithm we get
a = bqi + r u (0<ri<b).
Now if an integer x divides both a and b it must also divide
n, and similarly if x divides b and ri it must divide a: it
follows that (a, b) = (b, n). Repeat the argument on b
and ri if ri =}= 0,
b = n q 2 + r 2 , (0 < r 2 < ri).
As the remainders decrease by at least 1 each time, the
process must ultimately end with a zero remainder after a
finite number of steps:
r»-2 =■ r»-iq n + r n , (0 <, r„<r„_i);
T n -i = r n q n + h
It is clear that each of the pairs a, b; b, ri; n, r 2 ; . . ,;r„-i, r n
have the same set of common divisors: (a, b) = (r n -u r n ) = r n .
38
EXPLORING UNIVERSITY MATHEMATICS 1
SOME PROPERTIES OF INTEGERS AND PRIMES
39
The g. c. d. must therefore be the last non-zero remainder in
this (finite) process.
The uniqueness for the g. c. d. follows from the fact that
if d\, d-z both satisfy the definition for g. c. d., then d\ I lis
and dz | d\, which implies d\ = cfe. The evaluation of integers
u, v such that (a, b) = ua + vb can also be carried out from
the system of equations, by successive substitution giving
each r,- in turn as a linear sum u<a + v%b:
r\ = a — bqi = a + (- qi) b,
rg m b — q 2 n = (— q 2 ) a -f (1 + qiq 2 ) b, etc.
The existence of a g. c. d. enables us to obtain the most
important property of primes:
Theorem 2. For any prime p, if p | ab then p\a or p\b.
Suppose p | ab but p is not a divisor of b, then the only
common divisor for p, b is 1, and so (p, b) = 1. This
means that, for suitable integers u, v
1 = up 4- vb.
Multiplying by a gives
a = upa + oab,
and it is clear that p divides both terms on the right-hand
side so that p j a.
We have already seen (in the proof that there are infinitely
many primes) that any positive integer which is not a prime
is divisible by some prime. This allows us to express any
integer n > 1 in Z as a product of a finite number of primes.
We can now prove:
Theorem 3. The expression of an integer n > 1 as a
product of primes is unique apart from the order of the
factors.
Suppose we have two factorizations into primes:
li = P\ P2 • • • Pn = q\ 32 . • • q m -
Then since p\ | qi (q 2 . . . q m ) either p\ | q\ or pi | q 2 (g» . . . q m ).
by Theorem 2. By repeated application p\ | qi or p\ | q 2 or
Pi | <Z3 (q4 • • • qm) ; and so on. This means that there is some
fci for which pi | q,k v But then we must have p\ = q/^and
this term can be cancelled from the product leaving
P2 P3 • • • Pn = q2 qs' ... q m ',
where the right-hand side consists of the remaining factors
qi. The argument can now be repeated showing that each
of the factors pi, . . ., p„ occurs among the primes qi. After
they have all been cancelled there cannot be any q's left.
Hence n = m, and the factors q,- are just a rearrangement of
the factors p<.
The theorems we have proved are all (with the possible
exception of (a, b) = ua + vb) well known to any schoolboy
in the sense that he believes them and uses them, even if they
are not formulated in his mind. Thus all we have done so
far is to indicate the sort of arguments which can be used
to justify common arithmetical techniques. It is gratifying
to know that these are valid techniques since most of us have
found them very useful in practice. The reader might like
to evaluate the g. c. d.'s (180, 252), (1001, 7655) by the process
of Theorem 2 to convince himself that this is a sensible
method.
Let us now consider some other properties of primes which
either cannot be proved easily or constitute unsolved
problems.
40 EXPLORING UNIVERSITY MATHEMATICS 1
Distribution of primes
We have already shown that there are infinitely many
primes. These can clearly be written clown in increasing
order of magnitude to give an infinite sequence of prime
numbers,
Pi, P2, ■ • •> Pn, —
If one examines this sequence by looking at a table of primes
one forms certain impressions:
(i) the sequence looks irregular if one only considers a
relatively small part of it at once;
(ii) prime numbers become rarer among the large integers
and one feels that if an arbitrary integer greater than,
say a million, is considered, its chance of being a
prime is very small;
(iii) if one counts the number of primes in large blocks,
say between 2 k and 2 k+1 for k = 5, 6, . . ., one obtains
a sequence of integers which is fairly regular, and
does not seem to tend to zero;
(iv) no matter how far one looks in the table there appear
to be (occasional) "prime pairs", that is integers n
such that both n and (n + 2) are primes.
We can formulate the observations (ii) and (iii) precisely:
the status of (iv) is quite different — it constitutes a famous
unsolved problem of number theory and is striking because
it can be stated so simply:
Are there infinitely many positive integers n such that
both n and (n + 2) are primes?
and yet we arc still nowhere near a solution. Incidentally it
is possible to prove that any arithmetical progression of the
form a, a + d, a + 2 d, . . ., a 4- kd, . . ., where (a, d) = 1 (why
SOME PROPERTIES OF INTECERS AND PRIMES
41
is this condition essential?) contains infinitely many prime
numbers.
Since the sequence of primes is completely determined
(though it is not possible to give a simple arithmetical formula
which yields only primes, let alone gives all the primes) we
can define a function by
/ (n) = number of primes p <L n.
Since there are infinitely many primes we know that f (n) -> oo
as n — » oo, but our observation (ii) above leads us to suspect
that /' (n) is small compared with n for large values of n.
The density of primes among the first n integers is
d {n) =
/(«)
and can be computed for particular values of n. It can be
proved without too much labour that
d (n) — > as n -> oo,
so that, for very large n, if we pick an integer at random
among the first n integers it has only a small chance of being
a prime. Actually, although d (n) -» it does so rather
slowly, for n = 10 9 , d (n) is approximately ^.
We observed in (i) that the distribution of individual
primes among the integers is extremely irregular. However,
this irregularity "in the small" disappears if we look instead
at the average distribution given by d (n). The very regular
manner in which d (n) — >0 as n ~> oo is one of the most
remarkable discoveries of mathematics. If log n denotes the
logarithm to base e of the integer n, Gauss already noticed
that d (n) is very close to 1/log n and that the closeness of the
42
EXPLORING UNIVERSITY MATHEMATICS 1
SOME PROPERTIES OF INTEGERS AND PRIMES
43
approximation appeared to improve as n increased. If we
measure the goodness of the approximation by
r(n) =
d(n)
1/log n
then
r (10 3 ) = 1159, r (10 6 ) = 1084, r (10 9 ) = 1053.
Gauss conjectured early in the nineteenth century that this
ratio r (n) -* 1 as n -» °o, but he was unable to prove it. This
result is known as the prime number theorem. Although the
result can be stated using only relatively simple concepts, it
was almost a hundred years before analysis was developed
sufficiently to provide the tools for a proof. (The first proofs
were given in 1896 by Hadamard and de la Vallee Poussin.)
The first proofs used elaborate machinery of complex
function theory, but more recently so-called "elementary"
proofs have been devised. These are still too difficult to be
included in the average honours course for a university
mathematics degree.
of "very large". If the Goldbach conjecture is correct then
it follows easily that every integer n is the sum of at most
3 primes.
Nothing in this chapter has been original. I have only
tried to do two things. Firstly, to indicate how the usual
familiar arithmetical processes can be justified. Secondly,
to illustrate that difficult theorems and unsolved problems
can be stated using very simple concepts.
References for further reading
BlRKHOFF, G. and MacLane, S., A Survey of Modern Algebra,
Macmillan, 1941.
COURANT, R. and ROBBINS, H., What is Mathematics?, Oxford,
1948.
Goldbach conjecture
In 1742 in a letter to Euler, Goldbach observed that, for
every case he tried, each even number, other than 2, could
be expressed as the sum of two primes. He wanted to know
if this was true in general. We still do not know today
though very strenuous efforts have been made on this
problem. The cause of the difficulty is that primes are
defined using multiplication, while the problem involves
addition. However, there has been some progress towards
attacking the problem in recent years, and it is not hopeless
to expect a solution in a finite time. For instance Vinogra-
doff showed that for all very large integers n, it is possible
to represent n as a sum of at most 4 primes, but his proof
is an existence proof and cannot yield a precise definition
CHAPTER 4
Waves
P. Chad wick
1. Introductory remarks
When interviewing candidates for university admission I
am often asked the question "What is applied mathematics
at the university all about?" My stock answer is that
applied mathematics at the university continues to be largely
concerned with mechanics, but that we now look beyond
the motions of particles and rigid bodies to the mechanics
of materials such as gases, liquids and real solids which
deform when acted on by forces. This is all very well but
it gives no idea of what kind of mathematics is involved in
these more advanced studies, nor does it indicate the enormous
variety of physical situations which the mathematics is able
to portray. At the end of a 20 minute interview a short
answer is perhaps excusable, but today I find myself with
a generous allocation of time and a large captive sample of
the kind of people who turn up at interviews.
In order to try to convey to you some idea of the flavour
of applied mathematics at the university level I propose to
discuss in some detail a topic which crops up repeatedly
during the three-year duration of the typical course. This is
the theory of wave motion, and the reason for its frequent
appearance is simply that whenever a material body is
deformable a disturbance produced locally spreads out
through the body as a wave.
44
WAVES
45
Of course waves are a common part of our everyday
experience. We have all enjoyed generating water waves in
the bath and idly observing waves by the sea-side. On
analysing these experiences more closely we realize that
light waves are involved in conveying the undulations of the
water surface to our eyes, and sound waves in enabling us to
hear the noise which accompanies the motion of the sea.
From time to time we read or hear of earthquakes in which
waves travelling through the earth's crust cause vibrations
of the ground capable of demolishing buildings and other
structures. Recently radio waves from outer space have
been in the news, while terrestrial radio waves are the means
of wireless and television broadcasting.
We are all, then, familiar with waves and in particular the
crucial role which they play in communications. In your
case, moreover, this is an informed familiarity because you
have all had physics lessons in which certain important
properties of waves have been discussed and explained.
You know, for instance, that waves can be reflected and
refracted, can exhibit interference and diffraction, and can
sometimes stand still as in an organ pipe or on the strings of
a guitar. But the applied mathematician's approach to waves
is rather different from that of the physicist in that he is
interested in building up a general mathematical description
of wave phenomena rather than in giving so-called physical
explanations of particular wave properties. The operative
word here is general. It is characteristic of the mathematician
that he seeks to place his results in the most general possible
setting. In this way he is often able to show that seemingly
diverse ideas have a common basis. Increasing generality
and the development of unified theories are marked features
of university mathematics, and I hope that my later remarks
will show you how these tendencies operate in the particular
case of wave theory.
Before turning to some actual mathematics, mention should
be made of the considerable impact which the study of
46
EXPLORING UNIVERSITY MATHEMATICS 1
wave phenomena has had on the historical development of
mathematics. The classical theory of waves is largely the
creation of the great mathematicians of the eighteenth and
nineteenth centuries, but during the present century there
have been revolutionary developments in theoretical physics
in which the wave concept is of fundamental importance;
and progress has been made in the study of what are called
non-linear waves. The best known example of a non-linear
wave is the shock wave. Shock waves are produced in the
air by explosions and when an aeroplane accelerates until
its speed exceeds the speed of sound. These supersonic
bangs are also much in the news at the present time, but in
this chapter I shall consider only the more domesticated
linear waves.
2. Simple harmonic oscillations
In order to fix a few ideas I would like first to say
something about simple harmonic oscillations. Let us con-
sider three examples.
(a) Fig. 4.1 (a) shows a particle P of mass m attached
to one end of a light elastic spring. The other end of the
spring is fixed and the whole system (i. e. the spring plus
the particle) is assumed to move in a fixed horizontal straight
line. Let the spring have natural length / and elastic
modulus X, and let the displacement of P from its position
of equilibrium be £. Then the equation of motion of P (i. e.
Newton's second law of motion applied to P) is
dH X
where t denotes time.
(b) In Fig. 4.1 (b) the particle P (again of mass m) is free
lo slide on the smooth curve C which is fixed in a vertical
plane. Let the tangent to C be horizontal at the lowest
point O and suppose that P is released from rest at a point
WAVES 47
on C close to O. Then if s denotes the arc distance OP and
R is the radius of curvature of C at O, the equation of
motion of P is, to a first approximation,
d s s mg
"> ~r~ = — ir s>
dt* R
g being the acceleration due to gravity.
\-iumrmrmmrmr- • 9 m
111
(C)
equilibrium position of P
Fio. 4.1. Systems executing simple harmonic oscillations.
(c) The third example (see Fig. 4.1 (c)] concerns a uniform
circular disc attached at its centre O to a vertical wire the
upper end of which is fixed. We suppose that the disc is
mounted in a horizontal plane and is rotated slightly from
its equilibrium position before being released from rest. It
48
EXPLORING UNIVERSITY MATHEMATICS 1
then proceeds to execute torsional oscillations governed by
the equation
d 2
/— = -DO,
dt*
being the angular displacement of the disc from its
equilibrium position. The constants / and D appearing in
this equation are respectively the moment of inertia of the
disc about the axis of rotation and the torsional rigidity of
the wire.
These three examples have two features in common.
(i) The motion is completely determined when a single
variable (f, s or 6) is known as a function of the
time t.
(ii) This dependent variable in each case satisfies a
differential equation of the form
d^_
dt*
+ w 2 £ = 0,
(1)
where w is a constant characteristic of the physical
system.
Thus we can formulate the following statement.
Whenever the state at time t of a system is specified
by a single number which is determined by solving
a differential equation of the form (1), then that
system is performing simple harmonic oscillations.
In the above statement the concept of simple harmonic
oscillations is associated with a purely mathematical idea,
!
WAVES
49
namely the differential equation (1). This idea is essential
to all simple harmonic oscillations irrespective of the
particular physical context in which they occur. Had we
defined simple harmonic oscillations on the basis of acceler-
ation being proportional to distance, our definition would
have had to be worded so as to include linear acceleration
[example (a)], acceleration along a curve [example (b)] and
angular acceleration [example (c)], and even then it would
apply only to the oscillations of a mechanical system. But |
in equation (1) could represent current in an electrical circuit
or, say, the brightness of a star at a given time. We have
here, therefore, an example of the way in which the
mathematician relates effect occurring in different physical
situations to a central mathematical idea. In fact we have
set up what might be called an abstract mathematical theory
of simple harmonic oscillations which consists of the study
of the solutions of equation (1).
As is well known, assuming that o) 4= 0, the most general
solution of equation (1) is
f = a cos {oyt + a),
(2)
i
where a and a are arbitrary constants. In other words, every
solution of equation (1) can be obtained from the expression
(2) by assigning suitable values to a and a. For a particular
oscillation, equation (1) must be supplemented by two
subsidiary conditions which serve to determine a and a.
These conditions usually consist in specifying the values
of f and dtj/dt at a particular time, normally t = 0. In
physical terms this amounts to stating the manner in which
the oscillation is excited.
The basic differential equation, then, together with two
subsidiary conditions determines a unique solution of the
form (2). a is called the amplitude of the oscillation and a
50
EXPLORING UNIVERSITY MATHEMATICS 1
the phase angle. The period, 2ti/co, depends entirely upon
the nature of the system and not upon the manner in which
it is made to oscillate.
3. Oscillations, waves and differential equations
From what has been said about simple harmonic oscillations
I hope that you will now understand what is meant by
saying that, to the applied mathematician, an oscillation
is something which involves the solution of one or more
differential equations. The solutions are functions of time
and they usually have a periodic character like the result (2).
This leads us to expect that the mathematics of waves,
which I mentioned earlier, will also be, basically, a matter
of solving differential equations. Thus the whole subject
of oscillations and waves derives its mathematical framework
from the calculus which, after all, has been described (by
E T. Bell) as the "chief instrument of applied mathematics".
The mathematical distinction between an oscillation and a
wave is that whereas oscillations are governed by ordinary
differential equations, waves are governed by partial diffe-
rential equations.
Now before going any further we must be quite sure that
we understand what these terms mean. In the differential
calculus we meet functions of a single variable [e.g. f {x)\
and study, among other things, properties of their derivatives
(e. g. dfldx, d 2 f/dx 2 etc.). Later (usually at university) we
encounter functions which depend upon two or more
variables, for example q> (x, y). Here we need to know
values of both x and y before we can determine rp. Now
this function, <p (x, y), can be differentiated in more than
one way. We can imagine that y is a constant and then
differentiate q> as if it were a function of x only — this
derivative is written dcpldx — or we can treat x as a
constant and differentiate with respect to y, obtaining in
this case dcpldy, dcpldx and dcpldy are called the partial
"WAVES
51
derivatives of <p f and we can also define partial derivatives
of the second and higher orders. To take a simple example.
If cp (.v, y) = x 2 y + xy s ,
dw d<p
then -J- = 2xy + y* and — = X s + 3 xy*.
dx Oy
A differential equation is an equation which contains deri-
vatives: it is said to be a partial differential equation if it
contains partial derivatives; otherwise it is called an ordinary
differential equation. Equation (1) is an ordinary differential
equation and we shall now expect to set eyes on a partial
differential equation as soon as we embark on a mathematical
treatment of waves.
4. The wave equation
Having prepared the ground let us now try to arrive at
this wave equation as rapidly as possible. We shall do this
by constructing a mathematical representation of a simple
type of wave. According to the Shorter Oxford English
Dictionary, a wave consists of "each of those rhythmic
alternations of disturbance and recovery of configuration in
successively contiguous portions of a body or medium, by
which a state of motion travels in some direction without
corresponding progressive movement of the particles succes-
sively affected". This is a heroic attempt at the hopeless
task of embracing in a single statement all the possible
meanings of the term "wave". However, we single out for
our present purposes the idea of a wave as a disturbance
travelling in some fixed direction.
Consider a "waveform" travelling in a fixed direction with
constant speed c. We set up two sets of axes with origins O
and O' as shown in Fig. 4.2 (a) and (b), O being fixed and O'
moving relative to it along the x-axis with speed c. Thus O'
52
EXPLORING UNIVERSITY MATHEMATICS 1
moves with the waveform which, relative to the axes with
origin O', can be represented by an equation of the form
<P = f(X),
(3)
X being distance measured from O' in the direction of
propagation. The function / specifies the wave profile.
rig. 4.2 (a) depicts the situation at time to, and we suppose
that, at this instant, OO' = a. Then the relation between
(a) time t,
ID) lime
•a + cn-t,
— X
Fig. 4.2. Representation of a travelling waveform.
the distances x and X at time U, is X = x -a. Fig. 4.2 (b)
shows the position of the waveform at time t {> t ). In the
interval t — to which elapses between the two states, O'
moves a further distance c (t — t ) away from O so that, at
time /,
X = x~a-c(t- t ). (4>
Combining equations (3) and (4) we obtain the following
representation of the travelling waveform at time t:
WAVES
<p = f [x — a — c(t — to)}-
53
(5)
Now the constant a is at our disposal and so we can set
a = cto: this simply means that O and O' coincide at time
t = 0. Equation (5) then simplifies to
<p (*, t)=f(x- ct),
(6)
the notation on the left-hand side indicating that the
function <p representing the waveform depends upon distance
x in the direction of propagation and time t. There is no
need for us here to associate <p with any particular physical
quantity.
Equation (6) is the required result. For any function / it
represents a wave propagating with constant speed c and
without change of shape in the positive .^-direction. By
an exactly similar argument a representation of a wave
propagating in the same manner in the opposite direction is
<P (x, i)=g(x + ct).
(7)
Now what we are really seeking is a partial differential
equation which has the expressions (6) and (7) as solutions.
In order to find this equation we work out some partial
derivatives of <p. From equation (6),
<p = f {X), where X = x — ct, and so
dcp
Jx~
Thus
df dX
~dX dx
df d<p df dX
= dX ~dt ~"dX~dt
d s q>
~dx~*
d*f d*cp d 2 f
~ dX*' ~dt* " dX 2
1 dfy d 2 (p
c* dt* dx°-
= — c
dX
(8)
54
EXPLORING UNIVERSITY MATHEMATICS 1
and it can easily be shown that the expression (7) also
satisfies this equation.f
We have therefore arrived, by a direct but essentially
intuitive argument, at what is called the one-dimensional
wave equation (8). Equation (8) is said to be one-dimension-
al because <p depends upon time t and a single coordinate x.
5. Waves on a taut flexible string
Now that we know what the wave equation looks like we
can set about showing, in a more or less respectable way,
how it governs a particular type of wave motion. Probably
the simplest physical system which admits wave propagation
is a stretched string, and we proceed on the basis of the
following assumptions:
(i) The string is uniform and perfectly flexible;
(ii) The weight of the string is negligible in comparison
with the tension;
(iii) Fluctuations in the tension due to the motion are
negligible;
(iv) The string suffers only small deviations from its
equilibrium position.
In view of assumption (ii), the equilibrium position of the
siring is a straight line and we take this to be the x-axis.
The displacement cp of the string from its equilibrium position
will, in general, vary with time and from point to point on
the string. Thus cp is a function of * and t. Consider an
t It will be observed that the expression (6) satisfies the partial
differential equation
_ 1^.— &!£
c dt~ dx '
which is simpler than equation (8). But this partial differential
equation is unacceptable because it is not satisfied by the ex-
pression (").
WAVES
55
element PQ of the string of length ds and denote by tp and
tp + dtp the angles which the tangents to the string at P and
Q make with the x-axis (see Fig. 4.3). Because of assump-
tions (i), (ii) and (iii), PQ is subject to no forces other than
Fig. 4.3. Displaced configuration of a taut flexible string.
the constant tension T acting tangentially at P and Q. The
equation of motion of the element perpendicular to the x-axis
is therefore
gds —^- = T sin (tp + dip) — T sin tp
dt*
= T cos?/; dip,
where g is the (constant) mass per unit length of the string.
Dividing by Tds and proceeding to the limit ds — » 0, there
followsf
p d z cp dtp cos tp
— — — = cos tp —
T a/ 2 r ds
R
(9)
R being the radius of curvature of the displaced configuration
of the string. Finally, we express assumption (iv) in the
t It should be verified that the terms neglected do not affect
the limit.
56
EXPLORING UNIVERSITY MATHEMATICS 1
following more precise form: at all points of the string and
at all times the angle y is small. Then
1 _ d*q>
and with these approximations equation (9) becomes
1 d s <p
c 2 dt*
d*-(p
where c 2 = Tig. Thus the transverse displacement cp satisfies
the one-dimensional wave equation (8).
Having shown earlier that the expressions (6) and (7) are
solutions of equation (8), we can now assert that, subject,
of course, to the four assumptions which I have listed,
waves can travel along a taut flexible string in either
direction without changing their shape and with constant
speed V (T/q). A system which transmits waves of arbitrary
form without distorting them is said to be non-dispersioe.
In order to examine the wave motion of a stretched string
in more detail, we now take the very important step of
observing that when we add together the functions / and g
appearing in equations (6) and (7) we get a further solution
of equation (8): that is the expression
q> = f(x-ct)+g(x + ct)
(10)
satisfies equation (8). Now this means that waves can travel
along our string in both directions at the same time, each
wave propagating exactly as if the other did not exist.
Equation (10) tells us that the displacement of the string at a
given point is obtained by adding together the displacements
produced by the two waves propagating by themselves.
Mathematically this result is a consequence of a general
WAVES
57
property of equation (8), namely that if we have a number
of solutions then their sum is also a solution. This property,
which is called the principle of superposition, is one of (he
corner-stones of the mathematical theory of linear waves.
So far we have not had occasion to specify the form of
the functions / and g, nor has anything been said about what
happens when a wave reaches an end of the string. Let us
now consider a case in which / and g both represent sinu-
soidal waves by putting
/ (x - ct) = ha sin{k (x - ct)),g (x + ct) = \ a sin{fc (x + ct)}.
Here a and k are constants, and the reason for including the
factor h will appear in a moment. In view of equation (10)
we can now state that a solution of the wave equation (8) is
<p = ha [sin{fc (x - ct)}+ sin{fc (x + ct)}\ = a sin foe cos kct. (11)
FlG. 4.4 Standing wave on a taut flexible string.
To see what kind of disturbance of the string this represents,
first take t to be constant; that is take an imaginary snapshot
of the string at a certain instant. At this instant the string
is in the form of a sine wave (see equation (11) and Fig. 4.4),
and we see that there are certain points, called nodes, at
which the string is permanently undisplaced. Next take x
to be constant: that is choose a particular point P on the
string. It follows from equation (11) that P executes simple
harmonic oscillations about its equilibrium position Po. The
overall picture which emerges, therefore, is that of a standing
wave, and we conclude that the superposition of two sinu-
58
EXPLORING UNIVERSITY MATHEMATICS 1
soidal waves of equal amplitude travelling in opposite direc-
tions gives a displacement pattern of a stationary (i. e. non-
progressive) character.
Referring to equation (11) we see that the positions of the
nodes are given by sin kx = 0. The non-negative solutions
of this equation are
n 2n
*-*• *• T
so that the nodes are equally spaced at intervals of nlk.
Now if we fix the string at a node, then cut it immediately
to the left of the fixed point and throw away the left-hand
portion of the string the remainder goes on vibrating exactly
as before. In other words, conditions at a node are exactly
the same as conditions at a fixed end of the string. This
means that a standing wave can be maintained on a string
of finite length / fixed at its ends provided that / is an
integral multiple of nlk. But the constant k is at our disposal,
so it follows (after a little reflection) that a standing wave
of amplitude a with n + 1 nodes on a string of length I is
represented by the solution (11) with k = null. Standing
wave configurations with 2, 3 and 4 nodes are shown in
Fig. 4.5.
Although these remarks on standing waves have referred
specifically to a taut string fixed at its ends, we can state
the end-result in purely mathematical terms. We have
shown that solutions of equation (8) satisfying the subsidiary
conditions
g> = when x = and when x = I
are given by
nnx nnct
<p = a sin — 7- cos — ; — , where n = 1 , 2, 3, . . ..
/
/
WAVES
59
Now in a university course these solutions would probably be
found by applying to equation (8) a technique (called the
method of separation of variables) which yields other
solutions capable of satisfying different subsidiary conditions.
In this way an abstract theory for waves satisfying the one-
First harmonic
(tunda mental )k =vl\
Second harmonic, k = 2*71
Third harmonic, k = 3w/
FlG. 4.5. Standing wave configurations with 2, 3 and 4 nodes.
dimensional wave equation is built up, and this theory can
at once be brought to bear on the physical situations from
which equation (8) emerges. In the remainder of this chapter
I shall take a brief look at two such situations.
6. Sound waves
Sound waves are waves of compression, and since all real
materials undergo changes of volume when subjected to
pressure, sound waves can propagate in gases, liquids and
60
EXPLORING UNIVERSITY MATHEMATICS 1
solids. The situation in respect of solids is rather different,
and more complicated, than for gases and liquids, however,
and we shall defer consideration of elastic waves until later.
Suppose first that we have a tube containing a gas, say air,
and propagate a wave along this column of air, perhaps by
striking one end of the tube. Then u, the displacement of a
typical point of the gas from its equilibrium position, is a
function of x, distance measured along the tube, and time t.
It can be shown, by writing down the equation of motion
of an element of the gas, that If satisfies the one-dimensional
wave equation
1 d 2 u _ d 2 u
c 2 dt 2 ~ dx* '
where c, the speed of sound in the gas, is given by
c* = k/g .
Here k is a suitable bulk modulus and go is the density of
the gas in its undisturbed state. By arguments with which
you are no doubt familiar, the appropriate bulk modulus is
given by k = ypo, where y is the adiabatic index and po the
pressure of the gas in its undisturbed state.
Although waves on a taut flexible string and sound waves
in a tube are governed by the same partial differential
equation, they differ in one important respect. The string
is everywhere displaced at right angles to itself, whereas
the passage of a sound wave through a gas is accompanied
by motion of points of the gas in the direction of propagation
of the wave. Thus we say that sound waves are longitudinal
waves, while the waves on flexible strings which wc have
discussed are transverse waves.
Now, of course, gases are not always confined to tubes,
and it is a matter of common experience that sound can
WAVES
61
travel through an unconfined body of gas such as the open
air. The question therefore arises: "What is the equation
which controls the propagation of sound in the most general
circumstances?" The answer is the three-dimensional wave
equal ion
1 d 2 <p d 2 <p d 2 (f
~ dx 2
c* dt 2
d 2 q>
dy 2 dz* '
(12)
w v displacement
vector
FlG. 4.6 Three-dimensional rectangular cartesian coordinate
system.
where x, y, z are rectangular cartesian coordinates (sec Fig.
4.6). The dependent variable <p in equation (12) is a function
of the four independent variables x, y, z, t, three of which
are held constant when calculating one of the partial
derivatives. <p does not represent the displacement of a
typical point of the gas (which is now a vector quantity),
but the displacement can be found when <p is known.
The three-dimensional wave equation (12), besides being
the basic differential equation in the theory of sound waves,
enters into the theory of electromagnetic waves, such as
62
EXPLORING UNIVERSITY MATHEMATICS 1
light and radio waves. There is also a two-dimensional
wave equation, obtained by deleting the final term of (12),
which governs the transverse motion of a stretched membrane
situated, when undisplaced, in the (x, i/)-plane.
7. Elastic waves
Finally, we come to elastic waves under which heading, as
mentioned earlier, we include sound waves in solid materials.
The property of elasticity has to do with the ability of a
solid body to deform in response to applied forces and then
to recover its original form immediately those forces are
removed. A very wide range of solids exhibit this property
so long as the applied forces are not too large: examples arc
the common metals and alloys, rock and concrete. Now
a solid can be deformed in different ways. Like a gas or
liquid it can be compressed, and so has a bulk modulus k
which measures the volume change produced by unit pressure.
Unlike other forms of matter, however, a solid can sustain
forces tending to shear, or twist it, and the stress required
to produce unit shear is measured by the modulus of rigidity
ft. Thus two material constants, k and //, are needed to
specify the elastic properties of a solid, and our earlier
results lead us to expect that these constants, together with
the density g, will determine the speed with which waves
propagate through the material.
The displacement suffered by a typical point of an elastic
solid during the passage of a wave is a vector quantity and
so has three components u, v, w (see Fig. 4.6) each of which,
in general, is a function of the four variables x, y, z, i. It
follows that wave motion in an elastic solid is governed not
by a single partial differential equation but by three. These
equations turn out not to be wave equations, but the follow-
ing:
WAVES
d 2 u d (du 6d dw\
e— = (* + M— I— + — + — \ + f*
dt 2
d 2 v
dx \dx dy dz )
d (du dv dw\ (
(d 2 u
[dx^
d 2 u d 2 u
dy* dz*
dt* dy (dx dy dz
(d 2 v d 2 v d 2 D
dx* dy* dz*
d 2 w d ,
e-^ = ( fc + M^- T- + T-
dt*
dz
(du do dw
dx dy dz
dw\
(d*w
d*w
d*w
dz*
The solution of equations (13) is clearly a formidable under-
taking and has been exercising mathematicians for the last
100 years or so, but we can achieve a surprising degree of
simplification by supposing that the displacement compo-
nents depend only upon x and t. Equations (13) then reduce
to
d*u
d 2 u
PX"
d 2 V
d 2 o
9 dt* ** dx*
d*w
~dt*
d 2 w
dx 2
(14)
which are three one-dimensional wave equations.
Now looking for solutions of equations (13) which depend
only upon x and t is tantamount to considering the possible
existence of elastic waves travelling parallel to the Ar-axis
which give the elastic body the same displacement at all
points on planes perpendicular to the x-axis. Since we
have come to realize that whatever quantity satisfies the
wave equation is propagated as a wave, we conclude from
equations (14) that waves of the assumed type (plane waves)
can in fact exist. Moreover, we see that there are two kinds
(13)
64
EXPLORING UNIVERSITY MATHEMATICS 1
WAVES
65
of elastic waves travelling with different speeds, the two
speeds being
a ={(* + ! V)/Q} l/S , c 2 =(///p)V S .
For all known solids, ci > c-2.
Since u is the displacement component in the direction of
the avaxis, the wave travelling with speed Ci is longitudinal.
Elastic waves of this type are directly analogous to sound
waves and are called P-waoes (P for primary or for push-
pull). The waves travelling with speed c& on the other
hand, are transverse and have no counterpart in gases and
liquids. They are called S-waoes (S for secondary or for
shake).
We have shown here that an elastic disturbance of a
particularly simple kind travelling parallel to the jc-axis
is composed of a P-wave and two 5-waves. In fact this
result can be generalized and it can be shown that any elastic
wave travelling in the interior of a solid body is made up of
a P- and an S-constituent.
8. Seismology and the earth's interior
To end on a physical rather than a mathematical note, I
would like to indicate how the simple facts about elastic
waves which have just emerged have been used in determin-
ing in broad outline the internal structure of the earth.
At the beginning of this chapter I mentioned earthquakes.
The science which is concerned with the measurement and
interpretation of earthquakes is called seismology and it is
the most highly developed branch of physics of the earth or
geophysics. Earthquakes occur within the outermost 500
miles of the earth, usually within 50 miles of the surface. A
particular earthquake causes most disturbance at the place
on the earth's surface immediately above, but in addition it
generates elastic waves which travel outwards from the
source of the eathquake in all directions and can be detected
at points far removed from the severely disturbed area.
The elastic disturbance produced by an earthquake
contains both P- and S-waves which, of course, travel with
different speeds. Moreover, because the density, bulk mo-
dulus and modulus of rigidity of the material from which
the earth is made vary with depth, so do the speeds c\ and
C2, and this means that the seismic waves do not travel
through the earth in straight lines but follow curved paths
as shown in Fig. 4.7. In their journey through the earth
Epicentre
Earthquake locus
Fig. 4.7. Paths of seismic waves.
the waves may bounce off the surface and when this happens
an incident P-wave may give rise to a reflected P-wave and
a reflected 5-wave as well. It will be appreciated, therefore,
that the motion of the ground recorded at great distances
from an earthquake is very complicated. Nevertheless, this
is the record of a disturbance which has travelled through
the deep interior of the earth, bringing to the surface
information about the nature of the material through which
it has passed, and it is the deciphering of earthquake
records (or seismograms as they are called) which has yielded
much of our present knowledge about the inaccessible interior.
The task of determining from seismograms the paths along
which the elastic waves generated by a particular earthquake
travel through the earth calls for elaborate calculations as
well as a high degree of skill and experience on the part of
the seismologist. During the past 70 years, however, a
66 EXPLORING UNIVERSITY MATHEMATICS 1
network of seismological observatories covering most of the
continents has been built up and more and more earthquakes
have been recorded and analysed. From this vast quantity
ol data, tables have been computed which make it possible
to reconstruct the wave paths emanating from a particular
earthquake with great accuracy. The picture which emerges
from this reconstruction is a striking one.
Suppose that an earthquake were to occur beneath the
North Pole. Then over the entire northern hemisphere and
as far as latitude 15° S. (i. e. to a circle passing through
central Brazil, Zambia and the north of Australia) the
arrival of both P and S waves would be recorded. Between
latitudes 13° S. and 52° S. (a circle through the southernmost
tip of South America) hardly any ground motion would
occur, and between 52° S. and the South Pole only P-waves
would be recorded. The interpretation of this pattern
resulted from ideas put forward in 1906 by R. D. Oldham
and later followed up by a number of other seismologists.
The results obtained by these workers indicate that the earth
consists of a liquid core surrounded by a thick solid shell.
The core, being liquid, is unable to transmit S-waves and,
furthermore, is a much poorer transmitter of P-waves than
the solid shell. The effect of the core on seismic waves can
thus be pictured as follows. Imagine the earthquake to be
replaced by a lamp emitting light of two colours, white
corresponding to P-waves and red corresponding to 5-waves.
The core is completely opaque to red light and so will cast
a shadow at the earth's surface extending from the South
Pole to some lower latitude. While not opaque to white
light the core has different transmitting properties from the
surrounding mantle. This means that it acts as a giant lens,
focusing the white light towards the South Pole and
producing a shadow zone at lower latitudes (see Fig. 4.8).
The presence of a liquid core within the earth had been
predicted during the nineteenth century from rather different
considerations. It had been observed, for instance, that on
WAVES
67
descending a mine the temperature increases steadily. This
rate of increase, if extended to great depths, would imply
that the central part of the earth must be molten.
Mantle
Fig. 4.8. Formation of seismic shadow zones.
The confirmation of this earlier conjecture represents one
of the most striking achievements of seismology. The solid
outer shell of the earth is called the mantle and the depth
of the core-mantle boundary below the surface (first
determined in 1915 by B. Gutenberg) is 1801 miles.f
A further major seismological discovery was made in 1936
by the Danish woman seismologist Inge Lehmann. I have
stated that an earthquake occurring beneath the North Pole
would cast a shadow between latitudes 13° S. and 52° S. In
fact it is found that the shadow is not quite complete, weak
t The mean radius of the earth is 3959 miles.
68
EXPLORING UNIVERSITY MATHEMATICS 1
P-waves being recorded within this zone. Miss Lehmann
suggested that these waves could be due to the presence of
an inner core which is a very good transmitter of P-wavcs
(see Fig. 4.8). This explanation has been confirmed and the
radius of the inner core fixed at 777 miles. It is widely
believed that the inner core is solid, but this has not been
conclusively proved.
This brief survey shows how the study of earthquakes, in
conjunction with the theory of elastic waves, yields a broad
picture of the earth's interior as consisting of three major
components: an inner core, which is probably solid, sur-
rounded in turn by a liquid core, and a solid mantle.
9. Concluding remarks
In this chapter I have tried to give some inkling of the
kind of material which appears in university courses in
applied mathematics and to indicate the habits of thought
cultivated by the applied mathematician. I hope that 1
have not utterly depressed you by drawing the curtain aside
ou some rather fearsome partial differential equations or
raised your hopes too high by ending with a rather
spectacular application of wave theory.
References for further reading
Bell, E. T., Mathematics, Queen and Servant of Science, G.Bell
& Sons, 1952 (especially chapter 17).
Feather, N., Vibrations and Waves, Penguin Books, 1964.
HODGSON, J. II., Earthquakes and Earth Structure, Prentice-Hall
1964.
Jeans, J. H., Mathematics of Music, in The World of Mathematics
(ed. J. Newman), vol. 4, pp. 2278-2309, Allen & Unwin, 1960.
Sutton, O. G., Mathematics in Action, 2»d edition, G.Bell & Sons,
1957 (especially chapter 4).
CHAPTER 5
Square Fibonacci Numbers
J. H. E. Cohn
Introduction
It is usually thought that unsolved problems in mathematics,
and perhaps especially in pure mathematics must necessarily
be "hard" in the sense that the solution, if one is ever found,
will involve some essentially new idea or method. (Of
course, this description of "hardness" has nothing to do
with length; a "hard" proof may be very short and an
"elementary" one long and complicated.) In no other field,
it is alleged, is this more true than in the theory of numbers.
Shortly after taking my degree, when I said to my tutor
that I would like to do research in the theory of numbers,
I was strongly advised against this on the grounds that the
remaining problems were so hard that it was almost certaiu
(hat I would fail to solve any, and in the most unlikely
event of my succeeding I would deserve election to the Royal
Society rather than a doctorate.
I feel now that this advice was perhaps rather misguided,
and I hope to show this by giving an elementary proof of a
well-known conjecture in the theory of numbers, first proved
in 1963. As I have said, "elementary" means that all the
stages of the proof follow by simple reasoning from known
theorems. At several stages I shall quote theorems or results
which are well known or easily established. Proofs of these
are given in the Appendix.
69
70 EXPLORING UNIVERSITY MATHEMATICS I
The problem discussed here is mentioned in Tomorrow's
Math, by C. S. Ogilvy, on p. 100. (This book by the way
is a veritable mine of easily understood unsolved problems.)
Let the numbers u n be defined for all positive integers n,
by ui = U2 = 1; u„ + 2 = u« + i + u n for n > 1. The first twenty
or thirty such numbers are easily calculated, after which
they increase rather rapidly — ueo has thirteen digits. Of
the first few numbers, it will be observed that u\ = Ui = \
and U12 = 144 are perfect squares, and there do not appear
to be any others, at least for fairly small values of n. It
had been conjectured that in fact there are no more, and this
conjecture, although it seemed "reasonable" had so long
eluded proof that a computer was used to see whether it
could find any counter-example. The first million numbers
u n were in turn examined and no further perfect square was
found. In fact this is hardly surprising; we shall prove in
Theorem 3 that there are no more at alL
In the first place we observe that we can extend the
definition of u n to cover the cases n <, 0, by using the same
recurrence relation. It is then found that uq = and u_i = 1
and we shall prove that u n is a perfect square if and only
if n = - 1, 0, 1, 2 or 12. Secondly, we shall find it
convenient to consider also the Lucas numbers v n defined
by vi = 1, vz = 3 and v n *2 — o n + i + v n . These, too, may be
defined for all integers, positive, negative or zero.
Notation
At this stage it seems appropriate to explain the notation
which we shall use throughout. All the symbols m, n, x, y,
etc., which occur, except a and p are supposed to be integers,
not necessarily positive; the symbol k wherever it occurs
denotes an even integer, not divisible by 3. The symbol
(x, y) denotes the largest positive integer dividing both
x and y exactly. If *=t=0, then we write x\y \i x divides y
exactly and x <f y otherwise. Finally, if c > 0, we write
SQUARE FIBONACCI NUMBERS
71
a = b (mod c) if and only if c | (a — b). Thus for example,
(18, - 27) = 9; (4, - 5) = 1; (0, - 3) = 3; 4 | 12; 12 H 16;
21 =6 (mod 5); 18 = - 2 (mod 4), and so on.
Preliminaries
We shall need the following result a proof of which
appears in the Appendix.
Lemma. 7/ N is positive, and N = 3 (mod 4) then
^ + 1/2 = (mod N)
is impossible unless y and N have a common factor.
From the definitions of u n and v n the following values are
easily tabulated:
n
Un
V n
— 2
— 1
3
— 1
1
— 1
2
1
1
1
2
1
3
3
2
4
4
3
7
5
5
11
6
8
18
7
13
29
8
21
47
9
34
76
10
55
123
11
89
199
12
144
322
Now let a = h (1 + V5) and = i (1 - ]/5) be the roots
of the quadratic equation 2 = 6 + 1. Then it is a simple
72
EXPLORING UNIVERSITY MATHEMATICS 1
matter to prove the following sixteen formulae; details are
given in the Appendix.
a + = lja0--i, (i)
u n = 5-'/. fc» - /?»), (2)
o n = a" + 0», (3)
2 U„, + n = Un V n + U n D m , (4)
2 D OT + n = 5 ll m U„ + D m D„, (5)
u. n = (-l)^u n , (6)
D_„= (-1)«D„, (7)
*»-*^ + (-l)*-^, (8)
D„ + 12 = o« (mod 8), (9)
2|o m if and only if 3 | m, (10)
3 + d w if 4|m, (11)
(un, d„) =2 if 3 1 n, (12)
(u„, d„) = 1 if 3 + n. (13)
D A - * 3 (mod 4) if 2 | fc, 3 -f It, (14)
v m + 2k= - v m (mod d a -) if 2 | fc, 3 + fc, (15)
U»+2ft = - Um (mod D A ) if 2 | fc, 3 + fc, (16)
The main theorems
Theorem 1. d„ = .v 2 is possible only for n = 1 or 3.
Proof. (1) If n is even, we have by (8)
D n = D n/j * ± 2
for otherwise we would have
SQUARE FIBONACCI NUMBERS
(X + D n/j ) (X- Dn h ) = ± 2,
73
which is impossible, since x — D n/t and x + D„ !t are either
both odd or both even; in the former case the product is odd
and in the latter the product is divisible by 4.
(2) If n = 1 (mod 4), then either n = 1 or n =|= 1. If n = 1
we obtain di = 1 = l 2 . If n=f=l» then n — 1 is a non-zero
integer, divisible by 4. Suppose that 3 r , where r > 0, is the
highest power of 3 which divides n — 1, Then we may write
n - 1 = 2 . 3 r . fc, where 2 | fc and 3 + fc. Thus
D» = D 1+2 3 /- fc
= (- IKd! (mod D A )
= — 1 (mod Da-).
by repeated application of (15)
We see therefore that d„4=;v 2 , for otherwise we have
x z + l 2 = (mod Dk) which is impossible by the lemma,
since by (14) da- = 3 (mod 4) and, of course, da- and 1 have
no common factor.
(3) Finally, if n = 3 (mod 4), then we have exactly as
above, either n = 3 or n4=3. In the former case D3 = 4 = 2 2 ,
whereas in the latter we may write n — 3 = 2 • 3 r • fc, where
r ^ 0, 2 I fc and 3 i fc, and so as before
v n
= i?3+2.s r .A= (~ l)' r D3= - 4 (mod Da-),
by repeated application of (15). Hence again d„^=x 2 , for
otherwise we would have x 2 + 2* = (mod da) which is
impossible by the lemma, since da- = 3 (mod 4) by (14) and
so da- and 2 have no common factor.
This concludes the proof of Theorem 1.
Theorem 2. d„ = 2 x 2 is possible only for n = 0, 6 or — 6.
Proof. In the first place, we must have d„ even, and so by
(10) we can assume that 3 j n.
74
EXPLORING UNIVERSITY MATHEMATICS 1
(1) If n is odd, then since 3 | n, n must leave remainder 3
on division by 6, and so remainder 3 or 9 on division by 12,
i. e. n = 12m + 3 or 12m + 9 for some integer m. Hence by
repeated application of (9) we have
'l2m+9
D « = D 12m+3 0r °l
= i>3 or D9 (mod 8)
= 4 or 76 (mod 8)
= 4 (mod 8).
Thus h u n = 2 (mod 4) and so \ o n 4= * 2 for if x is odd then
a- 2 is also odd, and if x is even x 2 is divisible by 4.
(2) Suppose now that n = (mod 4). Then either n —
or n=)=0. If rc = we have d = 2 = 2 . I 2 . If rc 4= then
as before we may write n = 2 • 3 r • fc, where 2 | fc, 3 H k and r
is a positive integer. Thus just as before we obtain
= (-l)s r p =-2 (mod D k ).
Thus D k divides {v n + 2) = 2 (J d„ + 1); now by (14) v k is
odd, and so since v n is even, £ v n is an integer, and so Dn
divides (h d» + 1), that is
lv n + 12 = (mod d&).
Now, by (14) Dk = 3 (mod 4) and so by the lemma, wc see
that h o n 4= x 2 or o„ 4= 2x 2 .
(3) The remaining case is n = 2 (mod 4), or n = 2 or 6
(mod 8). Consider first n = 6 (mod 8). Then either n = 6
or n 4= 6. If n = 6 then v& = 18 = 2 • 3 2 , whereas if n =}= 6,
we may write n — 6 = 2 . 3 r . fc, where r is a positive integer.
3 'f fc and now 4| A\ Thus as before o n = ( — l) 3r 06 = — 18
(mod D k ). Hence D k divides r> n + 18 = 2 (i d„ + 3 2 ), and
SQUARE FIBONACCI NUMBERS 75
since by (14) Dk is odd, it follows that D/ { divides {h d„ + 3 2 ),
in other words,
I v n + 3 2 = (mod D k ).
Now by (14) Dk — 3 (mod 4), and since 4 | k, it follows from
(11) that 5 ■{ Dk, in other words 3 and D/< have no factor in
common. Hence from the lemma it follows that h D n =t= x 2 ,
in other words D n 4= 2.x: 2 .
Finally, if n = 2 (mod 8) then -n = 6 (mod 8) and by (7)
o tt = d_„. Thus D n — x 2 if and only if D- n = x 2 , and by what
has immediately preceedcd this is possible if and only if
— n = 6, that is n = —6. This concludes the proof.
Theorem 3. u„ = x 2 is possible only for n = — 1, 0, 1,
2 or 12.
Proof. (1) If n= 1 (mod 4), then either n = 1 or n=fcl. If
n = 1 we have u\ = 1 = l 2 . If rc 4= 1 then as before we may
write n - 1 = 2 . 3 r . k, where r > 0, 2 | fc and 3-ffr. Thus by-
repeated application of (16) we have
U n = "l + 2./.A = (- l) 3 ' Ui = - 1 (mod D k ).
Thus u.„ + l 2 = (mod Dk), and so by the lemma u n =£ x 2 ,
since as before by (14) Dk — 3 (mod 4) and D/ : and 1 have no
common factor.
(2) If n = 3 (mod 4), then - n = 1 (mod 4) and by (6)
U- n = Un- Thus u„ = x 2 is possible if and only if U- n = x 2 ,
that is if and only if — n — 1 or n = — 1, by the preceeding.
(3) If n is even, let n = 2 N. Then if u n = x 2 , we have
by (4) with m = n = N,
r2 =
U-2N = UN DN.
There are now two cases;
—
76
EXPL0R1NC UNIVERSITY MATHEMATICS 1
(A) If 3 + JV then by (13) (un, on) = 1 and sot we must
have un — y 2 and dn = z 2 . Now by Theorem 1 the latter is
satisfied only for N = 1 and N = 3; the first of these also
satisfies the former and gives n = 2; the second must be
rejected since we have taken 3 \ N.
(B) If 3\N, then by (12) (un, dn) = 2 and sof we must
have un = 2 i/ 2 and »# = 2 z 2 . Now by Theorem 2 the latter
is satisfied only by N = 0, 6 or —6. Of these values, the
first two satisfy the former whereas u_o = — 8 and so
N = —6 does not satisfy un = 2 y 2 . Hence we obtain exactly
two more values, namely n = or 12. This concludes the
proof.
Appendix
To prove the lemma, we shall first need
Fermat'S Theorem. If p is a prime and p \ z then
zp- 1 = 1 (mod p).
For, we observe that if p is a prime then the (p — 1) binomial
co-efficients pCu p C2, p C$, . . ., pC p _i are all divisible by p,
since if 1 <L r <, (p — 1), pC t is an integer which equals
P(p-l)(p-2)...(p + l-r)
1.2.3...r
and since p is prime, none of the factors in the denominator
divides p since r <. (p — 1). Thus
f Proof in Appendix.
or,
SQUARE FIBONACCI NUMBERS 77
(x + 1)p = xp + pCi xp~ 1 + . . . + PC p -i x + 1
= xP + 1 (mod p)
{x + 1)P - (x + 1) = xP - x (mod p),
and by repeated application of this we obtain
z p — z = (z - \)p - (z - 1) = ... = IP — 1 =0 (mod p).
Thus p divides zp — z = z (zp -1 — 1) for every integer z, and
so if in addition p \ z wc have that p divides zp -1 — 1, or
zP- 1 = 1 (mod p) as required.
Proof of the lemma. We wish to prove
// N is positive and N = 3 (mod 4) tf/ierc ac - + ,i/ 2 =
(mod N) is impossible unless y and N have a common
factor.
Proof. Suppose if possible that iV = 3 (mod 4), that N is
positive, that (y, N) = 1 and that N divides x* + y*. If N
is a prime, write N = p; if N is not a prime we may write N
as a product of primes p\, P2, • • ., Pk some of which may be
repeated. Now since N is of the form 4 m -r 3, it follows
that not all these primes p r can be of the form 4 m + 1 (since
the product of two numbers of this form is also of this form),
and clearly since N is odd all these primes p r are odd.
Thus one at least must be of the form 4 m + 3. Write p for
this prime (or for the smallest such if there be more than
one). Thus in any case, we have since p divides N, p = 3
(mod 4), (p, y) = 1 and x 2 + y 2 = (mod p).
Now since (p, y) = 1 it follows from Euclid's algorithm!
that there exist integers a and b such that ay — bp = 1-
Let z = ax. Then
f See, for example, Chapter 3, p. 36.
7S
EXPLORING UNIVERSITY MATHEMATICS 1
z- = a= x*
= — a 2 y 2 (mod p)
m - {bp + l) 2 (mod p)
= — 1 (mod p),
and so in particular p •f z. Also p = 3 (mod 4) and so
h (p — 1) is an odd integer. Thus
zP -i = ( Z 2)V* (p-D = - i ( m od p)
which is impossible by Fermat's theorem.
This concludes the proof of the lemma.
Proof of (1), (2) and (3). We know that a = h (1 + V*>)
and = J (1 — J/5) and so
a + = 1; a£= - 1 (1)
follows by simple calculation. Also we have by calculation.
ui = 5-'/* ( a - 0) = 1
and
u 2 = 5-'/t (a 2 - £ 2 ) = 1.
Now suppose that for
re = m, and re = rei + 1, u„ = 5 -1 /* (a n — n ).
Then
"m+2 = "m + 1 + Urn
= 5-Vt{a m (a + 1) -/?"' (/?+ 1)}
= 5- , Ma m+2 - m+s ),
since a and /? are the roots of 0- = + 1. The proof of (2)
for re > 1 now follows by induction. To prove (2) for re <L 0,
we observe that
SQUARE FIBONACCI NUMBERS 79
uo = 5-* (a - 0°) = 0,
u_i = 5- , /t( a - 1 -y?- 1 ) = 1.
Now suppose that for re = — rre and re = — m — 1,
u» = J-"A (a n - 0").
Then
u_ OT = u-m-i + M-m-2. and so
U-m-2 = M-w — re_ W) _i
= 5' 1 '* {a'" 1 - a-" 1-1 - 0~ m 4- y?-" 1-1 }
= 5-*/* {a-" 1 " 2 (a 2 - a) - /5~ m - 2 (0* ~ 0))
= 5 -1 /* (a _m - 2 - /?~"»-2)
since a and satisfy 2 — = 1. Again the result follows
by induction. This proves (2); (3) follows in just the same
way.
Proof of (4) ared (5). We now have
Um o n + u„ v m = 5- l '*{(a m - m ) (a n + n ) + (a n - n ) (a* + m )}
= 2 . 5- 1 /' (a m+n - m+n ) = 2 a m + n
and
5 u„, u„ + 9* o n = {a m - m ) (a n - n ) + {a m + m ) (a n + /?")
= 2 (a m+n + m+n ) =2o„ 1 + n .
Proo/ of (6) and (7).
u_„ = 5- ,/ *(a- n -/?-")
= 5- , / ! {(-^)n-(-a)»} by (1)
- (- I)"" 1 v n ,
and
D_ n = a _n + 0~ n
= (-0)n+ (-„)»
= (-D n P„.
80 EXPLORING UNIVERSITY MATHEMATICS 1
Proof of (8). Putting n = m in (5) we have
2 v 2m = 5 u m ! T D m 2
and putting n = — m in (5) we have, in view of (6) and (7),
4 = 2D =(-D'*- 1 (5u /n 2 -D w s). (i)
Thus 2 D2m + ( — l) m 4 = 2 p TO 2 , which on rearrangement be-
comes (8).
Proof of (9) and (10). Putting m = 12 in (5) we obtain
SQUARE FIBONACCI NUMBERS
81
thus
2 D n + 12 = 5 Uj£ U n + D12 P„
= 720 u„ + 322 p„
On + 12 = 360 U* + 161 p„ = p„ (mod 8).
Thus since 03, dq, P9 and P12 are the only even p r for
1 ^ r ^ 12, it follows that 2 | p OT if and only if 3 | p w .
Proo/ of (11), (12) and (13). Suppose that 4 j m, that is
m = 2 M, where M is even. Then by (8)
v m = dm 2 - 2,
and so 3 | o m would imply that 3 | (p ro + 3) = pa< 8 + 1, and
this is impossible by the lemma, since clearly (1, 3) = 1.
This proves (11).
To prove (12) and (13) we observe that if d = (u„, p„) then
d 2 divides 5 n* 2 - o n 2 = ± 4 by (i). Hence d = 1 or 2. Also
by (i) u n and v n are either both odd or both even, and so
d = 1 if o n is odd, and d = 2 if o n is even, that is by (10).
d = 2 if 3 I n and d = 1 otherwise. This concludes the proof
of (12) and (13).
Proof of (14). If 2\k and 3 + fc, then fc = 2, 4, 8 or 10
(mod 12), and so by (9),
Dk — 1>2, D 4t *>8 or D10 (mod 8)
= 3, 7, 47 or 123 (mod 8)
= 3or7 (mod 8)
= 3 (mod 4).
Proof of (15) and (16). If k is even, then by (4) putting
n = m = k,
U2k = u k v k = (mod p a ),
and by (8) v»k = o/. 2 — 2 = - 2 (mod P*),
and so using (4) and (5) we obtain
2 Dm* 2 fc = 5 U m Iff* + D m D2k = ~ 2 D m (mod D/t),
am
2 U OT + 2/j = Um»2A + U2AOm= ~ 2U„, (modD/t),
and we have (15) and (16) on dividing by the factor 2, which
is legitimate since by (10) d* is odd.
Proof of statements on p. 76. Suppose that ab = x 2 and
(a, b) = 1. Then suppose if possible that there exists a prime
p which occurs in the factorization of either a or b (say a)
raised to an odd power, 2 k — 1 say, where k i> 1. Since p | a
and (a, b) = 1 it follows that p t b, and so p 2 k ~ 1 1 ab but
/>-' fc -I ab. Hence p 2 k_1 1 a: 2 but p 2 k % x 2 , which is impossible
since if p tk ~ l I x 2 then p k | x and so p 2fc |^; 2 . Hence every
prime factor o:' either a or b occurs in the factorization of a
or b respectively raised to an even power and so we must
have a = A* and b = B 2 .
Now suppose that ab = x 2 and (a, b) = 2. Then 2 | a, 2 | b,
so 4 I x 2 hence 2 I x. Let a = 2 a, b = 2b' and * = 2 *'.
82
EXPLORING UNIVERSITY MATHEMATICS 1
Then a' b' = x' 2 and now (a', b') = 1. Thus by the previous
result, a = A s and b' = B 2 and so a = 2 A 2 and b = 2 B 2 ,
which was to be shown.
CHAPTER 6
Digital Computers
and their Applications
M. Levison
In an age of major technological advances, there can be few
with such far-reaching potential effects as the development
of the electronic digital computer. In a span of less than 20
years, the computer has emerged from the laboratory to
become an extremely powerful tool touching the everday
livcs of a large part of the population. The pay-slip of the
factory-worker, the rate-bill of the householder, the house-
wife's grocery list, the schoolboy's examination marks — these
are but a few of the many items which at some stage in their
existence may be processed by computer.
One of the less happy results of the computer's phenomenal
rate of progress has been the air of magic and mystery
which has grown up around it. Yet there is no fundamental
reason why this should be so, for the basic principles
underlying it may be readily understood by the layman
with little knowledge of mathematics and none at all of
electronics.
The present article is in two parts. The first of these
describes what a computer is and explains the purpose of
its various individual sections; the second outlines briefly
a few of its many current applications, to give the reader
some impression of the extent of the computer's vast potential.
85
84
EXPLORING UNIVERSITY MATHEMATICS
DIGITAL COMPUTERS AND THEIR APPLICATIONS
85
Let us begin with a description of the computer and the
five basic constituent units of which every computer is built.
In order to describe the part which each of these units plays
in a calculation, it will be convenient to draw an analogy
with a clerk working in a company pay office, whose weekly
task is to calculate the amount of pay due to each of the
company's employees, given certain items of data, such as
the number of hours each employee has worked that week,
his rate of pay per hour, his income-tax code number, and
so on.
Consider then the fundamental activities which the clerk
must undertake in carrying out these calculations. The first
operation in evaluating each individual's pay will probably
be to multiply his hourly rate of pay by the number of hours
worked in order to arrive at his gross pay for the current
week. Thus the clerk will have to be able to carry out simple
arithmetic operations, such as addition, subtraction and
multiplication. For this purpose he may be provided with
a pencil and paper, or preferably, for greater speed and
accuracy, with a desk calculating machine.
Another task which the clerk must perform is to control
the "flow" of the calculations, that is to say, he must decide
which arithmetic operations to carry out at each stage, on
which data the operations are to be performed, and what to
do with each of the results he obtains. This he does by
acting in accordance with a set of instructions which he has
previously been given, perhaps by bis boss on his first day
in the pay office, or perhaps (in effect) by a school
mathematics teacher some year earlier. This set of instruc-
tions the clerk has throughout the calculation to retain in his
"memory", either literally or artificially, by committing them
to a piece of paper which he can read whenever necessary.
In the course of the calculation, certain intermediate results
may be produced which will be required for further compu-
tation at a later stage. These, too, must be recorded in the
(natural or artificial) memory of the clerk. A further task
which the clerk may be called upon to perform is that of
taking certain very elementary decisions. In order to achie\p
greater flexibility in devising the plan of a calculatioiiv it
may be found necessary that the list of instructions given to
the clerk should contain certain branch points. He may, for
example, be told that if an employee's tax code number
exceeds 50 then he is to obey one set of instructions, while
if it is less than or equal to 50 then he is to obey an alternative
set. The clerk must therefore have the ability to decide the
relative size of different numbers.
The items described above are in fact the basic ingredients
of any calculation, and each of the clerk's activities has its
counterpart on the computer. Like the pay clerk, the com-
puter, too, must be able to carry out arithmetic operations.
For this purpose all computers are provided with what is
called an arithmetic unit. Pairs of numbers can be sent to
this unit, and their product or their sum or their difference
or their quotient can be obtained from it. It would be quite
possible to use the arithmetic unit of a computer in exactly
the same way as one uses a desk calculator, with the human
being sitting at the machine sending numbers to the
arithmetic unit and receiving the appropriate results back
again. It would, however, be very uneconomical to build
an expensive unit capable of adding and subtracting numbers
in times of the order of 1 millionth of a second (on the most
recent machines), if the operator is only going to be able to
insert the numbers at a rate of about one every 5 or 10
seconds. The computer is therefore provided with its own
control unit and memory unit. Into the memory unit the
list of instructions (called a programme) is placed by the
human operator before the calculation begins. The control
unit then takes these instructions and carries them out one
at a time, sending numbers when appropriate to the
arithmetic unit and dispatching the results received to their
correct destinations. The memory unit may be used not
only for storing the instructions which the computer is to
86
EXPLORING UNIVERSITY MATHEMATICS 1
obey, but also for storing the intermediate results which the
human being might write down on paper. Like the human
pay clerk, the computer has an element of discriminatory
ability. It, too, can compare two numbers stored in its
memory, decide which is larger (or whether they are equal),
and follow alternative paths of instructions according to the
result obtained.
So far then we have encountered three of the basic,
constituent units of the computer. There are two others to
consider. The first of these is its input unit, which enables
the computer to receive information, both instructions and
data, from the outside world. Information to be communi-
cated to a human being must be passed to him by way of
his sensory organs, most frequently his eyes or ears. In the
case of the pay clerk, he will receive his instructions and
data either in the form of the spoken word, or in the shape
of a document which he may read, or possibly in braille
which he can detect by touch. The sensory organs of a
computer usually take the form of devices capable of
detecting photoelectrically the presence or absence of holes
punched in a card or a length of paper tape. The hole
patterns, which may be interpreted as binary digits, are
transmitted by the device to the computer memory.
Now the human being is taught at an early age to recognize
certain groups or patterns of sounds which make up his
native language and to assign to them certain specific
meanings. By using this language, any other person may
readily communicate with him. In the case of the computer
certain patterns of digits have a built-in meaning as instruc-
tions or data determined by the computer's electronic circuits.
These form the basic "language" of the computer and we
may communicate information to the computer by expressing
it in this language.
The fifth of the basic units of the computer is its output
unit, whereby it may convey information, for example, the
results of its calculation or details as to how the computation
DIGITAL COMPUTERS AND THEIR APPLICATIONS
87
is progressing, back to the outside world. For this purpose
the computer may perhaps be made to punch further holes
in cards or in paper tape, or to operate some form of electric
typewriter. A functional diagram of a computer illustrating
the relationship between the basic units is shown in Fig. 6.1.
CMINt
HI
iwu:
urni
OUTNT
UNIT
Renin d
CUMl'tML i
l«HUl <l WttMI.C
•lenliori
MMMM ■
'Uaill ••
Maim *.mfr*'t
I'lG. 6.1. Diagram showing the basic units of a computer,
and their interconnections. In addition to the connections
shown, all units receive control signals from the control unit.
To gain some idea as to what is involved in expressing
the plan of a calculation in the language of the computer,
we return once again to our analogy with the pay clerk.
Once he has calculated the gross pay of any employee, his
next task is to perform a more elaborate series of calculations
resulting in the deduction of PAYE income tax. Now the
actual means whereby such deductions are assessed are very
complicated, and to understand them fully involves a careful
study of successive Finance Acts. In order, therefore, that
the system may be operated by employers not expert in tax
affairs, the Inland Revenue prepares a special booklet
comprising a set of instructions and also some tables of
figures. For each employee, there is provided a tax code
number, determined by the allowances to which he is entitled,
and a card divided into eight numbered columns.
The instructions, expressed in a slightly simplified form,
begin as follows:
88 EXPLORING UNIVERSITY MATHEMATICS 1
(1) Write the employee's gross pay for the current week
in column 2 of the card.
(2) Add this to the entry (made the previous week) in
column 3, writing the sum back in column 3.
(3) Obtain from the tables the amount af tax-free pay to
date corresponding to the particular date and tax
code number, writing this in column 4.
(4) ...
The instructions continue in a similar manner until ultimately
at the end of the calculation, the amount in column 7 is the
tax due for the current week, while a further subtraction
gives the employee's pay after tax has been deducted.
What has in fact been achieved is that the elaborate
process of evaluating the income tax due has been broken
down into a series of simple operations, such as recording
a number on a card, adding two numbers together, and so
on. The pay clerk has in effect been programmed by the
Inland Revenue in much the same way that a computer must
be programmed by a human user. The computer is capable
of certain basic elementary operations, such as recording
a number in some given cell of its memory, adding together
two numbers stored in its memory, and so on. More
complicated operations must be broken into steps such as
these. Nevertheless, using only these very simple operations,
one can build up the most complicated and complex
calculations. Indeed, it is possible to programme the
computer to manipulate not only numbers, but also symbols
and alphabetical data, for, as we shall see later, it is quite
simple to represent the latter by some numerical form of
coding.
The reader may perhaps be tempted to ask whether the
human user might not be able to perform the calculations
himself in the time taken to write so detailed a programme
DIGITAL COMPUTERS AND THEIR APPLICATIONS
89
of instructions. The analogy with the pay clerk again
provides the answer. It is clear that he need only be given
instructions on pay calculations for one of the employees
Thereafter, he may evaluate the pay of any other employee
simply by repeating the same set of calculations, but with
fresh data applicable to that employee: a different rate of
pay, another tax code number, and so on. In fact a great
many calculations contain repetitive parts like this. Fur-
thermore, it is found that there are many sequences of
computer instructions, carrying out some particular operation
which occurs over and over again. One may, for example,
want to calculate the square root of a number many times
during a calculation, and in many different calculations.
Exactly the same sequence of computer instructions can be
used each time. Sequences of calculations of this type are
known as subroutines. Over the course of time a great set
of these subroutines may be gathered together, thus sub-
stantially reducing the time taken to programme a new
calculation.
By this stage the reader should have at least a rudimentary
picture of the structure and the functioning of a digital
computer. Let us now turn to considering some of the
purposes for which computers are being used.
The bread and butter applications of computers are to be
found mainly in the commercial world. We have already
seen that a computer may be used to carry out pay-roll
calculations, and there are many other programmes that fall
under the heading of business applications. Computers, for
example, are now used by many large business concerns to
keep an inventory of their trading stock. This inventory is
updated each day or each week, and the computer reports
when the quantity of any stock item falls below a critical
level so that the goods may be re-ordered. In some cases
the comptitcr is also provided with such details as the cost
of each item, the time delay in receiving ordered goods, and
the average frequency with which the item is requested by
90
EXPLORING UNIVERSITY MATHEMATICS 1
customers. It is then programmed to determine the best
balance between a heavy cash outlay in re-ordering stock
and loss of trade through not having the stock available
when called for.
Banks, too, have begun to use computers to file and update
details of their clients' accounts. Evidence of this is to be
found in the appearance during recent years of curiously-
shaped figures at the bottom of many cheques. These figures
are carefully designed to allow them to be read directly
into a computer by a special piece of apparatus, without the
need for preparing punched cards or paper tape. At the
present time the majority of banks use these figures to
record only the code number of the branch. At least one
major bank, however, has reached the stage of recording not
only the branch number but also the cheque number and
the customer's account number, while the amount of the
cheque is also inserted once it has been used and returned
to the bank. The entire processing of the cheque may then
be handled by computer. It is only by adopting techniques
such as this that banks will in the future be able to keep
pace with the growing volume of business which they will
be called upon to undertake.
Calculations of the type just mentioned are often referred
to as data processing. Data processing operations are
characterized by the fact that a large quantity of data is
fed into the computer, a large quantity of results are output,
but only a fairly simple series of calculations are performed
on them. Thus, although any computer whatever would be
capable of performing the calculations, these would be done
most efficiently on a machine with extensive input and
output equipment and only a simple arithmetic unit. In
contrast, there are many other problems in which the amount
of input and output is light while the quantity of computation
is very large. For such problems it may be more efficient
to use a computer with a more complex arithmetic unit but
with fewer input and output devices. Larger establish-
DIGITAL COMPUTERS AND THEIR APPLICATIONS
91
ments with varied problems may, of course, possess
computers having the best of both worlds. Such computers
will carry out efficiently all types of problems.
Computers are also used extensively for calculations in
the scientific and engineering fields, including both calcu-
lations hitherto undertaken by human endeavour, and
calculations which on account of their size could not
previously have been contemplated.
To give an example, in recent years a number of individuals
have been launched both by the USSR and the USA to orbit
the earth in artificial satellites. None of these launchings
could have taken place without the assistance of the
computer at a great many points. A particularly important
task for the computer occurs in the moments immediately
after the rocket has been launched, for it is then essential
to determine as rapidly as possible, whether the satellite
will enter a satisfactory orbit. For this purpose, the position
of the rocket must be carefully observed at short intervals
of time during the first few seconds of flight. These
positions may then be substituted into the appropriate
equations, and details of the orbit calculated. Were this
task to be undertaken by a team of mathematicians, a great
many hours or even days would elapse before the solution
was known, by which time the unfortunate astronaut might
have landed far from the nearest rescue ship in the middle
of the Pacific Ocean. With the help of a computer, however,
details of the orbit may be determined within seconds, and
the astronaut ejected from his rocket if the orbit is not
satisfactory.
A situation of a similar type occurs in weather prediction,
where barometric observations, etc., may be used to set up
equations describing the motion of the atmosphere. These
equations must, however, be solved within an hour or two if
the results are to have any value in forecasting weather
conditions before they actually occur; so that, here again,
the use of a computer is essential.
92
EXPLORING UNIVERSITY MATHEMATICS 1
In addition to their use in numerical calculations, com-
puters have also been applied to the translation of languages.
The reader may well wonder how a machine designed
primarily to perform calculations on numbers may carry
out operations on the words of a language. This may be
quite simply achieved, however, by coding the linguistic data
into a numerical form. We may, for example, choose to code
letter A as 01, B as 02, C as 03, . . ., Z as 26. Then a word
such as CAT would be represented by the number 030120,
and BEAST by 0205011920. Once coded in this way, the
words may obviously be stored in the computer memory
and may be compared for equality or alphabetical order,
just as numbers are compared for size (indeed, if we adopt
the coding system just proposed and if the numbers are
aligned by the left, then the alphabetical order of the words
is the same as the numerical order of the numbers represen-
ting them).
A rudimentary form of translation might proceed as
follows. In the computer memory there would be stored a
long list or "dictionary" of, say, French words. In adjacent
memory cells would be stored for each of these its English
equivalent. Given a text in French, each word would be
read into the computer, looked up in this French dictionary
(possibly by comparing the text word with every dictionary
word in turn until an equal one is found, though there are
much quicker methods), and the English equivalent output.
In this way, the computer would produce a word for word
translation of the given text.
Now as a translation this effort would be of little use.
It is apparent, however, that the machine might go much
further; for one can readily store in the computer memory
not only words but other coded information, such as
grammatical details, and so on. Opposite each French
word in our dictionary, we might have not only its English
equivalent, but also some information telling us whether this
particular word is a noun or an article or an adjective or
DIGITAL COMPUTERS AND THEIR APPLICATIONS
93
a verb. Instead of reading the text a word at a time and
printing its translation, the computer might read an entire
sentence at a time and look up each of the words in the
dictionary. It might then make various adjustments to these
words according to the grammatical information. If in
French, for example, the computer detected a noun followed
by an adjective, it could be made to invert the equivalents
to their more normal English order.
Close study of the translation problem reveals a number
of other difficulties many of which have yet to be overcome.
They form, however, an interesting field of current research.
Apart from the actual translation of languages, computers
have also been used in the furtherance of literary studies; to
help in the preparation of the detailed word-indexes which
literary scholars call concordances, to collect from texts
statistical data which may be used in the study of authorship,
and to assist in the jig-saw-like problem of reconstructing
small fragments into manuscripts.
Another problem area to which computers have been
applied is the playing of games. The games concerned fall
into one of two groups. For some, such as "noughts and
crosses" and "Nim", a complete solution is known, either in
the shape of a set of rules ensuring a win for one side, or
(which amounts to much the same thing) in the form of an
exhaustive analysis of all possible positions. Computer
programmes to play this type of game present few difficul-
ties, and are of little real interest.
Rather more interesting are those games, such as draughts
and chess, for which no winning system is known and for
which the vast number of possible positions makes exhaustive
analysis wholly impractical. The draughts-playing pro-
gramme of A. L. Samuel is a good example. This plays by
looking at all possible positions two or three moves ahead,
and pursuing even further those positions in which develop-
ments (i. e. exchanges or jump moves) are imminent. The
resultant positions are then evaluated in terms of features
94
EXPLORING UNIVERSITY MATHEMATICS 1
such as material advantage, mobility of pieces, and so on,
the computer selecting the move which leads to the best
score available, subject to the assumption that its opponent
will attempt the opposite.
Two other noteworthy aspects of this programme are that
it retains in its memory details of board positions encountered
in play for use in later games, and that it can, in the light of
success or failure, modify the weights attached to the
different features used in evaluating positions. Thus, after
a number of games, the programme "learns" to play a better
game. Indeed such was the improvement shown in the skill
of Dr. Samuel's programme, that it was able to defeat one
of the leading players in the United States.
The behaviour of this programme bears a close resemblance
at a number of points to that of a human being. It is in
fact one of a group of programmes which have been written
with the intention of imitating human behaviour. Apart
from their considerable entertainment value, the more serious
motive behind them is an attempt to discover more about
the working of the human mind. To mention but a few
examples, programmes have also been written to play chess,
to prove theorems in logic and pure geometry, and so on.
Naturally in the limited space available this article has
been able to discuss only a mere handful of the very many
fields to which digital computers have been and are being
applied. It is hoped, however, that this will have been
sufficient to give the reader some idea of the vastness of
their scope and potential, so that he may be better able
to appreciate the impact which computers will make on the
world of the future.
Reference for further reading
Samuel, A. L., Some Studies in Machine Learning using the Game
of Checkers, in Computers and Thought (ed. E. A. Feigenbaum
and J. Feldman), McGraw-Hill, New York, 1963.
CHAPTER 7
The Isoperimetric Problem
H. G. Eggleston
The isoperimetric problem can be described as follows.
We consider a closed curve which does not overlap itself
and which lies in a plane. Fig. 7.1 (a) shows examples
of the types of curve that we have in mind; Fig. 7.1 (b)
D
m
Fig. 7.1.
95
96
EXPLORING UNIVERSITY MATHEMATICS 1
shows examples of curves that we do not consider. The
curve will divide the plane into two parts of which one is
enclosed by the curve and the other is not. The points in
the part of the plane enclosed by the curve form a set which
we refer to as the "enclosed set". We can now state the
isoperimetric problem which is, If we know the length of
the curve, what is the largest possible value for the area of
the enclosed set? In other words we consider the class of
all such closed curves which have a given length and we
wish to describe, in some form or another, that curve which
encloses the largest possible area. In very simple cases we
can actually calculate the area; for a square, equilateral
triangle or circle of perimeter length L the enclosed set has
area L 2 /16, L 2 /12]/3, Dl±n respectively. But to solve the
isoperimetric problem we must consider curves for which it
is quite impossible to calculate the area of the enclosed set,
and we need quite different techniques.
This problem is an interesting one for several reasons. It
illustrates in dramatic form the difference between asking
a question and finding the answer, or rather of establishing
the validity of the answer. The problem had occurred to
the Greek geometers of over 2000 years ago but its complete
solution was not discovered until the 1880's. This solution,
which is that the curve must be a circle, was known to the
Greeks, but it is one thing to know what the solution of a
problem is and quite another to establish this solution in an
adequately convincing form or, as we say, to prove it
mathematically. Another reason for being interested in
this problem is that although formulated in mathematical
terms that are relatively concrete and close to physical
reality, yet its solution depends upon abstract concepts that
were not defined until comparatively modern times. Again
the techniques developed to solve this problem are applicable
in a wider context for attacking a whole range of problems,
called by the generic title of Extremal Problems. Finally,
the problem is one of a very general type in mathematical
THE ISOPERIMETRIC PROBLEM
97
work, that of establishing some relation or connection
between a property of a set (in this case the enclosed set),
and a property of its frontier, (in this case the curve).
One of the difficulties, which the Greeks came across, with
this problem, was that of providing a suitable mathematical
formulation. One of the difficulties in mathematics, as in
most other intellectual disciplines, is knowing where to start.
The Greeks started from geometrical concepts. Their basic
elements were geometrical ones; points, lines, planes, and so
on. Greek geometry, enunciated with such remarkable
lucidity by Euclid and his fellow workers, dominated
mathematical thinking and teaching in post-Renaissance
Europe even until the beginning of the present century. The
greatest mathematician of all time, Isaac Newton, felt that
he must write mathematics in a geometrical form. He made
discoveries by using his own invention, the calculus, and
was able to prove results that would have been quite beyond
the capacity of Euclid and the Greek geometers, but having
once got his results, he felt that he had to clothe them in
geometrical language. Somehow this made the most
revolutionary discoveries respectable. At a later stage
when Gauss had discovered non-euclidean geometries, he
dared not publish these terrible heresies, because of a well-
founded fear of the ridicule of his fellow mathematicians.
Now, however, all is changed. Euclid is no longer a
textbook at school or at the university, and we do not publish
mathematical papers in geometrical form. Geometry of the
euclidean type is definitely out, and non-euclidean geometry
is equally definitely on the way out. We now base the
whole of pure mathematics on the concept of sets. Sets of
what? Well, not sets of anything very specific. Mainly
sets of numbers and sets of sets of numbers and sets of sets
of sets of numbers, and so on. Indeed, the idea of a numbeT
can itself be defined in terms of sets. We need to make
various precise assumptions about the nature of a set. When
these have been made we believe that the whole of pure
93
EXPLORING UNIVERSITY MATHEMATICS 1
mathematics can be deduced from them by entirely logical
arguments.
It may, of course, be arguable at any one time as to what
is logical and what is not, but there is generally a consensus
of opinion which will agree as to what is valid and as to
what is not valid, even though these ideas of validity change
from time to time and what we consider to be a valid proof
now, may well be regarded by our mathematical successors
as being merely a primitive approximation to a logical
argument.
Our problem, then, has to be assessed in terms of mathe-
matical concepts. We have used the expressions closed
curve, enclosed set, area, length and also the idea of largest.
Each of these concepts is to be translated into mathematical
language in a rigorous way. When this translation has been
done we can only operate mathematically on our new concepts.
It is no longer permissible to use any of the physical pro-
perties from which we abstracted to secure the mathematical
analysis. Once we have defined our mathematical ideas
we leave the physical world behind completely; we live
exclusively in a mathematical world which has nothing in
common with reality and in which it is completely inad-
missible to do anything except mathematics. It may be
wondered that in this situation, mathematics ever produces
results which have any contact with the physical world at
all. If the arguments that we use are divorced from
physical reality, it does seem rather surprising that the
results should yet have physical applications, but we can
look at the matter rather in the light of a translation from
one language into another. If we regard physical pheno-
mena as being some sort of a language and a mathematical
abstraction of these phenomena as being some sort of a
translation of them, then it is feasible that the process of
translation into mathematics and of the use of logical
arguments in that mathematics should produce results which
THE ISOPERIMETRIC PROBLEM
99
can be re-translatcd into an interpretation of the physical
environment from which they were first abstracted.
One of the geometers who worked on the isoperimetric
problem in the last century was Jacob Stciner. He produced
a number of solutions whose validity was later challenged
by Weierstrass on the ground that they contained an
unproved assumption. Although there was this gap in his
arguments, Steiner's ideas have proved to be very important
in this and allied problems.
Basically Steiner's idea was as follows. Firstly, be believed
(correctly) that the solution to the problem was a circle.
Suppose that we denote the circle whose perimeter length
is L by C then the area enclosed by C is L 2 /4jt. Consider
some other closed curve of length L, say K, enclosing a set of
area A. If we can show (L 2 /47i) ^> A whatever K is considered,
then we shall have solved the isoperimetric problem. But
this is difficult; we cannot establish this inequality directly
because C and K may be of very diverse shapes and
quite different from one another. Instead of making the
comparison from K to C directly, let us try to go by small
steps from K to C. Suppose that we could find a new
curve K* of length L enclosing an area A* so that A* ^ A,
and suppose further that if K is not a circle then we can
find K* so that A* > A (a strict inequality), then shall we
not have solved the isoperimetric problem? Steiner thought
that we should, but actually there is a gap in his argument.
If we were able to establish the situation described above,
then we should have proved the assertion "If K is not a
circle then K is not a solution to the isoperimetric problem"'.
This is logically equivalent to "If the isoperimetric problem
has a solution then that solution is a circle" but it is not
logically equivalent to "The circle is the solution of the
isoperimetric problem". One has to assume that the iso-
perimetric problem actually has a solution; that is to say
one has to assume that there actually is a curve that has
length L and, of all such curves, encloses the largest possible
100
EXPLORING UNIVERSITY MATHEMATICS 1
area. This may not be the case at all. In fact there are
similar questions which do not have a solution in this
sense. For example, the Kakeya problem asks "What is
the least area of a plane set, which has the property that
a segment of unit length can be rotated completely round
inside the set through 360° back to its original position?"
In fact given any positive number s one can find a plane
set of area less than e having this property, but one cannot
find such a set with s replaced by zero. Thus the above
question is wrongly phrased. In the Kakeya problem there
is no set of least area.
Perron has illustrated the gap in Steiner's argument in
a striking form. We know that there is no largest positive
integer or whole number. If we assume for the moment
that there is such a largest positive integer then we can
prove that it is 1 by an argument similiar to Steiner's. We
consider any integer n and transform n to n 2 . Now n 2 ^ n
and if n=j=l then n 2 > n. Hence no integer, except possibly
1, is the largest. Therefore 1 is the largest possible integer.
This nonsense follows from the false assumption that there
is a largest possible integer. Steiner's assumption that there
was a solution to the isoperimetric problem was not false but
did need to be established as correct. In fact this has been
done and we use the concept of compactness to determine
a wide class of problems which we know have solutions.
Although Steiner's argument had this gap, his idea of
approach has proved a very fruitful one in many fields. We
shall now assume that the isoperimetric problem has a
solution and consider how Steiner put his programme into
effect. For simplicity we shall restrict our consideration
to convex curves. They will be closed curves in which the
direction of the line of motion (or tangent) rotates in one
and the same sense as we describe the curve once. Figure
7.2 (a) gives examples of convex curves, Fig. 7.2 (b) exam-
ples of non-convex curves, and Fig. 7.2 (c) a curve which,
although it satisfies this condition on the rotation of the
THE ISOPERIMETRIC PROBLEM
101
(ai Convex closed curves
(hi Non-convex closed carves
(ci A curve that satisfies Ihe
condition on the sense ot
rotation ot Ihe directed tangent
Out is not closed
Fig. 7.2.
tangent, yet does not qualify for consideration, since it is not
closed. A property of a convex curve that we shall assume
is that a straight line either (i) does not meet it at all, or (ii)
meets it in one point, or (iii) meets it in two points or (iv)
meets it in a segment (Fig. 7.3).
Given the convex curve K of length L Steiner's aim was
to find a second curve K* of length L enclosing an area that
102
EXPLORING UNIVERSITY MATHEMATICS i
Fig. 7.3.
was strictly larger than that enclosed by K, unless K was
actually a circle. How was he to define K* from K? He
considered the properties of circles, particularly those which
characterized circles amongst all convex curves, and then
used the fact that K, if non-circular, did not satisfy these
properties. One such property that circles possess is that
J he angle in a semicircle is a right angle. We can phrase
THE ISOPERIMETRIC PROBLEM
103
this as follows: if C is a circle and a, b are two points on
C such that the two arcsf into which a, b divide C are of
equal length then any point c on C is such that Z. abc = \n.
Moreover this property characterizes circles, if a, b are given
fixed points and c a variable point such that ^ abc = I n,
then c lies on a circle of which a and b are diametrically
opposite points. Suppose then that K is not a circle and
that p and q are two points on K dividing it into two arcs
of equal length then there must be a point r on one of these
arcs such that ^L prq =j= \ n. (See Fig. 7.4.)
Denote the two arcs of K with end points p and q by
A'i and K z . Suppose that K x is that arc which contains r,
and that it is itself divided into arcs K u , £12, by r, where
£]] has end points p, r and K i2 has end points r, q. If we
move q along the line pq to q x and r on the circle centre p
radius pr to r x so that distance n q, = rq, then we can
construct arcs congruent to K n joining p, rj and congruent
to Ki 2 joining n, q h These two arcs form an arc Kf with
a total length equal to that of £, and enclose/with segment
pqu an area (say T) which differs from the area enclosed
by segment pq and K x by exactly the amount that the area
of the triangle pn q x differs from that of prq. If ^ prq > h 71
take qi nearer to p than q; if ^ prq <h n take g, further away
from p than q. In either case if q x is near q the area of the
triangle pn q\ exceeds that of the triangle prq.
Thus the arc Kf has the sort of properties that we require,
but of course we still have to modify the arc K z . We can
do this in an exactly similar way provided that we can find
a point s on it with ^ psq =|= J n and such that if ^ prq > \n
then ^_ psq > In whilst if ^ prq < %n then ^ psq < hx. This
could certainly not be done if K 2 was a semicircle. So
Steiner put a preliminary argument in to cope with this
t By arc we mean a connected part of a closed curve. It need not
be an arc of a circle. Any two points on a closed convex curve
divide the curve into two arcs.
104
EXPLORING UNIVERSITY MATHEMATICS 1
possibility. He observed that he could choose the points
p, q on K so that (i) they divided K into two equal length
arcs as before and (ii) neither of these arcs was a semicircle.
To see this we have to show that if K is a convex curve
with the property that when we divide K into two equal
length arcs by two points p, q, then one at least is a semicircle,
then K itself is a circle. Take any two points p, q dividing
K into two arcs K\, K 2 of equal length. Suppose K 2 is a
semicircle. If K\ is a semicircle then K is a circle. If K\
is not a semicircle there is a point t on K\ that does not lie
on the circle of which £2 is a semicircle (Fig. 7.5). Let u
be the point of K such that tu divides K into two equal
length arcs. Denote these arcs by £3, K,\. By hypothesis at
least one of K3, K 4 is a semicircle. Each of these arcs has
an arc in common with £2 and therefore, if it is a semicircle,
lies on the same circle as K 2 does. Thus t lies on this circle.
This is a contradiction, which shows that our original
assumption was false, and K was in fact a circle.
Thus Steiner's subsidiary argument is proved. Given that
K is not a circle we can find two points p, q dividing K into
two equal length arcs K\, K 2 neither of which is a semicircle.
THE ISOPERIMETRIC PROBLEM l05
Next, in order to make sure that we can pick r on K t and
s on K 2 with Z.prq, </psq both greater than In or both
less than hn, we select that one of K u K 2 that with segment
pq encloses the larger area (if both enclose the same area take
either) and reflect it in line pq. Suppose that we had selected
Kt and that £1 and its reflection K\' in pq form the closed
curve £3 (see Fig. 7.6). Take r on K t as before, r' its reflection
Fie. 7.6.
in line pq on K% f . Move q along pq and define £,'* from K\'
in precisely the same way that K x * was defined from £,.
The two arcs K x * and K\* together form a closed curve K 4
that encloses a larger area than that enclosed by K and is
of length L.
K 4 need not be convex. It may fail to be convex at only
the four points p, q, n, n'. We can replace part of the
curve K 4 near each of these points by a straight segment
(the operation at p is shown in Fig. 7.7) to produce a convex
curve £5 whose length is less than or equal to that of K A
(because a straight line segment is the shortest distance
between two points) and enclosing an area at least as large
as that enclosed by K 4 .
If the length of K 5 is L u expand K 5 by a similitude with
the ratio LlL\ [transforming the point (x, y) of the plane
into the point {{LlU) x, {LIU) y)] to obtain a new curve K*.
106
EXPLORING UNIVERSITY MATHEMATICS 1
THE ISOPERIMETRIC PROBLEM
107
Fic 7.7.
Finally, K* is of length L, is convex and encloses an area
larger than that enclosed by AT. Thus Steiner's procedure
is completed.
An alternative method devised by Steiner was to symmet-
rize the set concerned. We make use of the fact that any con-
.
Fig. 7.8.
vex curve encloses a convex set, and that such a set meets a
straight line in either a point, a segment or in no points at all.
Consider a convex curve K and a straight line X (see Fig.
7.8). For each point p on X denote by m p the line through
p and perpendicular to X. This line meets the set enclosed
by K in (i) no points, or (ii) one point, or (iii) a segment.
Take on m„ in case (i) no points, in case (ii) the point p
and in case (iii) the segment whose midpoint is p and whose
length equals the length of the segment in which m p meets
the set enclosed by K.
We do this for every p of X. The totality of all the
constructed points forms a set that is enclosed by a convex
curve denoted by K . The proof of this is omitted. It can
also be proved that the length of K ? does not exceed that
of K, and, indeed, that the length of K x is strictly less than
that of K unless K x is itself symmetric about some line parallel
to X (in which case K x is a translation of K). Here "symme-
tric about a line" means "coincides with its reflection in the
line
I shall not prove any of these statements. But if we assume
that they are true, and if we assume that there is a solution
to the isoperimetric problem, then the properties imply the
following. If, say, the curve T is a solution of the isoperi-
metric problem and we take a particular direction then
there is a line l lying in the direction 6 such that f
coincides with its reflection in l . Take two perpendicular
directions and let the corresponding lines /,,, l v+x/t intersect
at O (see Fig. 7.9). Then all the lines l pass through O.
For, if this were false, select one l not passing through O
and let the line through O perpendicular to this l Q meet f
in p and q. Since F is symmetric about Iq one of p, q is
nearer to O than the other, say p nearer than q. Since r
is symmetric about l v and l v+3t/i , is convex and contains q, it
also contains the rectangle with centre O and one vertex at
q. But p is an interior point of this square which is
impossible as p lies on f itself. This is a contradiction.
10S
EXPLORING UNIVERSITY MATHEMATICS 1
Fig. ?.9.
Hence Iq must pass through O and Iq is simply the line
through O in the direction 0. T is symmetric about every line
through O.
But then T must be a circle, centre O. Otherwise there
are two points h, j on T whose distances from O are not
equal (see Fig. 7.10). Take the line I which is the internal
bisector of angle jOh. r is symmetric about /. Thus t passes
through points j, h', h, j' in some order, where h' and j' are
the reflections of h, j in I. This contradicts the convexity
Fig. 7.10.
THE ISOPERIMETRIC PROBLEM
109
of r, but I shall not give a detailed proof of this. Thus
again we have a solution of the isoperimetric problem.
These solutions of the isoperimetric problem by Steiner
are similar in the sense that they depend upon a modification
of an extremal curve. The procedure is:
(i) assume there exists a solution,
(ii) modify the solution curve in some such way that,
using the fact that it is a solution, we can identify
it geometrically.
Many geometrical problems can be tackled in this way.
The power of the method lies in the vast range of possible
modifications available, the limitation of the method lies in
the fact that there are only a few curves that can be identi-
fied geometrically. The method is also going to be difficult
to apply if there is more than one solution, i. e. if in addition
to a circle there had been some other curve that solved the
isoperimetric problem, our modification would have had
to be chosen so as to distinguish these two curves from all
other convex curves. This is usually not possible. The
modifications used by Steiner were obtained from two
properties of circles:
(i) an angle in a semicircle is \ n, and,
(ii) a circle coincides with its reflection in any line
through its centre.
One of the advantages of Steiner's approach to the
isoperimetric problem is that it gives us a method by which
many other problems can be tackled. An entirely different
approach is due to the Danish mathematician Tom Bonnesen.
The idea behind this approach is to consider not one convex
110
EXPLORING UNIVERSITY MATHEMATICS 1
THE ISOPERIMETRIC PROBLEM
111
curve so much as a family of associated convex curves. The
existence of this family enables us to construct a more
elaborate mathematical structure, which makes more of
mathematics available and leads to a simpler solution.
Consider a convex curve C of length L (see Fig. 7.11).
About each point p of C as centre construct a circle of
Fig. 7.11.
radius x. We get a family of circles whose envelope we
denote by C x . As x varies, taking positive values, we get
a family of curves and it is this family of curves that we
operate with instead of C itself. To begin with, each curve
C x is convex. I shall not prove this. Next if C is of
length L and encloses an area A then C x is of length
L + 2 tix and encloses an area A + Lx + nx 2 . These results
are because C is convex. They may be proved by prov-
ing them first in the particular case when C is a polygon,
and then approximating to a general convex curve C
by convex polygons. The proof when C is a polygon is
easy because then C x is bounded by segments equal and
parallel to the sides of C and by some circular arcs. For
example in Fig. 7.12 the heavily shaded area can be put
together to form a circle of radius x. Their total area is
tix 2 . The lightly shaded areas have an area Lx and the
dotted portion is the set enclosed by C, hence of area A.
I shall not attempt to validate the approximation argu-
ment. The area of C x is A + Lx + tix 2 . This expression
is quadratic in x. We consider it for all values of x both
positive and negative. This is a typical mathematical
manoeuvre. The negative values of x do not correspond
lo any very obvious geometrical entity, but we can use them
to throw light on the behaviour of the quadratic function for
positive values of x. Consider the equation
Fig. 7.12.
nx 2 + Lx + A = 0.
When does this have real roots? Completing the square, the
equation is
L\* L 2 A
x +
2ti
4ti 2 n
TJ
Thus we have real roots if and only if — > A. But this
4:t
112
EXPLORING UNIVERSITY MATHEMATICS 1
THE ISOPERIMETRIC PROBLEM
113
last inequality is the assertion that the area of C does not
exceed the area of a circle of perimeter L, i. e. it is the
assertion that the solution of the isopcrinietric problem is
a circle. Thus we have only to show that for any convex
curve C nx 2 + Lx + A = has real roots in order to solve
the isoperimetric problem. When x is large nx 2 + Lx + A
is large and positive. Thus the equation nx* + Lx + A =
will have real roots if and only if we can find some value
of x (which would be positive or negative) for which
nx 2 + Lx + A is negative or zero. Now there exists in the
area bounded by C a circle of largest possible radius. Let
this largest possible radius be r. Then we shall show that
nx 2 + Lx + A is negative or zero when x = — r. We call
r the inradius of C.
We again consider only the case when C is a polygon.
The general case can be deduced by an approximation
argument that we shall not consider. The largest circle
modification ol C
Fig. 7.13.
.
inside the convex polygon C either touches two opposite
parallel sides, or it touches three sides which, when extended,
form a triangle containing C. We deal with this last case.
The other case is slightly easier and we shall not consider it.
The process is illustrated in Fig. 7.13.
Let I be this incircle of C and let O be its centre. The
three points of contact of I with C divide C into three arcs
de, ef, fd as shown. Suppose that the sides of C containing
d, e, f respectively when produced form the triangle ghj,
then we modify C as follows. Each side of C that lies in
arc ef we move parallel to the line gO towards O until it
touches I or lies on a line touching J. Similarly each side
of C that lies in arc ed we move parallel to jO towards O,
until it lies on a line touching / and finally each side of C
in arc df we move parallel to the line hO towards O until
it lies on a line touching /.
The final figure is not convex but it contains 7 and thus has
an area exceeding nr 2 . Suppose that the sides of C in order
are of length L\, Lo, . . ., L/ k . and that L,- is distant p,- from O
where i = 1, 2, . . ., k. Then if, as before, C encloses an area
A and is of length L,
k k
A = ShLi p t , L = 2 L it
i l
where the first equation is obtained by dividing C into
triangles each with O as a vertex and two adjacent vertices
of C as the other two vertices. When we modified C the
side of length Lf was moved a distance (p; — r) until finally
it was distant r from O. In this motion it described a
parallelogram whose thickness perpendicular to the side of
length Li was (p, — r). Thus this parallelogram was of area
Li (pi — r). The total loss of area in the modification was
k
2 Li (pi — r) and the final area exceeded nr 2 . Hence
A - 2 Li (p { - r) ^ nr 2 ,
114
EXPLORING UNIVERSITY MATHEMATICS 1
i. e.
i. e.
A-2A-
Ttr 2 - rl
tL > nr 2 ,
- A<0.
Thus «** -f Lx + A is negative or zero if x = — r. Hence
it has real roots and thus finally the circle is a solution of
ihe isoperimetric problem.
1 his method is fundamentally different from Steincr's.
We here make no assumption that there actually is a solution.
Our method of proof shows that there is a solution at the
same time that it identifies the circle as being a solution. So
far by this method we have not shown that the circle is the
only possible solution. But we can also do this.
Firstly there is a smallest circle 7i containing C (again
we assume C is a convex polygon and omit the approximation
argument needed in the general case). Either there are two
diametrically opposite points of h that belong to C or there
are three points of 7i that belong to C and form the vertices
of a triangle which contains Ou the centre of l\ t in its
interior. We consider this second case only; the first case
is similar but rather easier. Suppose the three points of 7i
on C are d, e, f. Select points j, g, h lying in the interior of
the angles less than n, ^L dOi c,ZeOif,Z fOi d. Move each
segment of C in arc de parallel to 0\ j away from 0\ until
ii lies on a line touching fr. Perform similarly on the seg-
ments of C in arcs ef, fd using g, h instead of j. Let R be
the radius of If, R is called the circumradius of C. The
final figure includes Ii and encloses an area not less than
ttR 2 . Hence
A + 2Li(R- Pi ) >nR\
1. C.
7i R 2 - RL + A < 0.
Thus n X s + Lx + A is non-positive when x = — R. Hence
the distance apart of the roots of n x 2 + Lx + A = is not
less than R — r. Hence
THE ISOPERIMETRIC PROBI.KM
(V A\\
2 >R-r.
115
Squaring gives L 2 — 4ti A i> tz 2 (R — r) 2 .
Thus L 2 = 4 7i A only if R = r, i. e. C is a circle.
We can prove the isoperimetric inequality for the class
of all convex polygons by an entirely different argument.
We can proceed by an argument which although it modifies
the length and the area concerned, does so in such a way
that the quantity L 2 — 4-tiA (which is known as the
isoperimetric deficit, L being the length and A the area of
the convex set), is reduced. The procedure also reduces
the number of sides of the convex polygon. If follows that
if we can show that the isoperimetric deficit is positive for
a triangle, then it will follow that it is positive for all
convex polygons. This argument is an attractive one because
it makes use of the polygonal nature of the sets concerned.
It will not be possible to apply it to convex sets which are
not polygons but its application to polygons is particularly-
appropriate.
Consider then a convex m-sided polygon, say P, with
length of perimeter L and area A, we shall assume of course
that m is greater than 3. (See Fig. 7.14.) We shall construct
a fc-sided polygon, where k is less than m, say Pi, of length
of perimeter L\, and area A\ such that
L 2 - 4nA>L l 2 - ItiAl
Let Si, . . ., S m be the sides of P in cyclic order, and let
B\, . . ., B m be the interior bisectors of the angles of P, B,
bisecting the angle between S{ and Si + l if Km and B m
bisecting that between S m and S\. Then every point of
Bj has the same distance from Sj as from Sj + i and the
points of the intersection of consecutive bisectors are at the
same distance from at least three sides. As these points of
116
EXPLORING UNIVERSITY MATHEMATICS 1
Fig. 7.14.
intersection are finite in number there is one or more of them
which are at a least distance from the sides of the polygon P.
Suppose that this distance is denoted by X. If we move the
sides of P inwards by an amount X we obtain a new polygon
which we denote by Pi. It has fewer sides than P and its
length L\ and area A\ are related to those of P by the
formula
A = A x + XU + hX (L - U), i. e. A - A x = J X(L + L x ).
Thus we have the relation
(L* - 4 7i A) - (Li» - 4 it At) = (L - U) (L+ U) - 2 nX (L + Lt)
The parts of the sides of the polygon P which are in excess
of the length of the parallel sides of P\ can be put together
to form a polygon circumscribing a circle of radius X. Hence
L — L\ is larger than the perimeter of this circle, i.e. 2 n/..
Thus finally we see that the isoperimetric deficit of P
exceeds that of Pi.
THE ISOPERIMETRIC PROBLEM
117
The idea underlying this argument can be used to obtain
an even more precise result. We have seen that the quantity
L 2 — 4/Ti A is always positive or zero for any convex polygon,
and is zero for a circle. In fact, it is also always positive
for any plane non-circular convex set. We may prove this by
an approximation argument starting from convex polygons.
Thus we can regard the quantity L 2 — 4-tiA as a measure of
the degree to which the given convex set differs from a circle,
the larger this quantity, the less like a circle is the given
convex set, and the set is a circle if and only if this quantity
is zero. It is natural to consider whether there are other
functions which we can define and which have similar
properties. If there are such functions, it may be possible
to compare their value with the value given above. There are
two fairly obvious functions of this nature. One is L — 2 nr,
where r is the inradius of the convex set. This function
is always positive or zero, and is positive unless the set is
a circle. Similar remarks apply to the function 2nR — L,
where R is the circumradius. Thus we may hope to extend
our results by comparing the isoperimetric deficit with some
functions depending upon either L — 2nr or 2tiR — L; in
fact, the isoperimetric deficit is greater than or equal to
(I — 2 7ir) 2 and we can prove this by the following extension
of the previous argument. Let P be a polygon whose sides
are of total length L and which is of area A (see Fig. 7.15).
Denote by P<j the polygon whose sides are parallel to those
of P and at a distance d from those of P measured inwards.
For small values of d, Pa has the same number of sides as
has P. But as d increases, the number of sides decreases.
Denote the length of Pa by La, its area by Ad and its inradius
by r,i- Also denote by fa the area bounded by the polygon
whose sides are parallel to those of P<* and which all touch
a given circle of unit radius. As d increases from zero to r^,
there are a finite number of values, say d\, . . ., d* at which
the number of sides of Pd decreases. Suppose that we had
two values of d, say x and y, both lying between the same
118
EXPLORING UNIVERSITY MATHEMATICS t
Area (, 8'
Perimeter lengTti 2 f, 5
The polygons p tf all have the same incenlre.
Fig. 7.15.
two consecutive d { so that P x and P v have the same number
of sides. Then it is easy to see that if x is greater than y
and x — y = d, then we have the following relations:
A x = A y + L y d + f„d 2 ,
i-'X ~ LJ]/ ~T" 2 f y d,
r x = r„ + <3,
U = fy
By simple algebraic manipulations, we conclude that the
expression
L x r x — A x — f x r x z
is identical with the same expression with .v replaced by .»/.
In other words this quantity is a constant between any two
consecutive values of the d t . Consider what happens when
the variable d is allowed to increase through one of the
values of di. Each of the quantities L dy r dt A d , is a continuous
THE ISOPERIMETRIC PROBLEM
119
function of d. The function fa increases at d, because the
number of sides of P d decreases. Hence the expression
given above decreases at di, and thus it decreases as d
increases from zero to r. As d tends to r, L d , r d and A d
all decrease to zero and f d maintains a certain value,
thus f d rd 2 also tends to zero. In other words, the quantity
which we have above tends to zero, but it is decreasing and
must therefore always be greater than zero. In particular.
take d = 0; we deduce that
Lor — Aq — for 2 > 0, and since fo ^ n, Lq = L, Aq = A,
we have Lr — A — nr 2 > 0.
Hence L z — 4-tzA ^ (L — 2 nr) 2 and this is the result which
we had to prove. This result is also contained in Bonnesen's
argument.
The Greeks never really got to grips with the isoperimetric
problem. Their difficulties were of a different type from
those that they encountered in their other famous unsolved
problems, the duplication of the cube, the trisection of an
angle and the squaring of a circle. These last three problems
were unsolved because they were, and are, insoluble. It is
just impossible to construct a segment of length 2* f *L given
a segment of length L and using only straight edge and
compasses, and similarly for the other problems.
But in the case of the isoperimetric problem the solution
was there and the Greeks knew that it was, and what it
was. But the problem involved such a wealth of complex
concepts that the Greeks were never able to define these
ideas and to give the problem a satisfactory mathematical
formulation.
Today the problem has been generalized out of all
recognition until finally the name isoperimetric problems
has been applied by the distinguished mathematicians Polya
and Szego to problems that are very remote from the original
one but which, being physical phenomena determined by
120 EXPLORING UNIVERSITY MATHEMATICS 1
geometric properties, can be tackled by methods similar to
those of J. Steincr. Some of these can be found in the
references given below.
References for further reading
KoGLESTON, Convexity, Cambridge University Press.
ECGLESTON, Problems in Euclidean Space, Pergamon Piress.
LYUSTERNIK, Convex Figures and Polyhedra.
PrtLYA and SZEGO , Isoperimetric Inequalities in Mathematical
Physics, Princeton University Press.
YAGLOM and BOLTYANSKH, Convex Figures, Holt Reinhart &
Wiston.
The seven lectures included in this
book formed the programme of the
1965 Conference in Mathematics
held at Bedford College, London.
They are published in this volume
because they were given by pro-
fessional mathematicians on
subjects of current mathematical
interest, and will be of value to a
far wider audience than that able to
attend the Conference. The lectures
are primarily designed for students
who are in their last or penultimate
years at school and who will sub-
sequently take a mathematics
degree, orone in which mathematics
forms a major subject.
Publication of the lectures in this
book will also considerably extend
the effect produced by their
presentation at the Conference;
that is, the achieving of an impact
which greatly helps to create
interest by introducing students
to branches of mathematics which
will be novel to them. An interesting
point is that two of the lectures
givea simplified accountof research
work recently presented by the
lecturer.
m
x
-o
o
•t
5'
to
c
3
<"
CO
2
=r
3
..JLI
510